Supervised Learning for Insider Threat Detection

[thumbnail of MANOHARAN_Phavithra-Thesis.pdf]
Preview
MANOHARAN_Phavithra-Thesis.pdf - Submitted Version (2MB) | Preview

Manoharan, Phavithra (2024) Supervised Learning for Insider Threat Detection. PhD thesis, Victoria University.

Abstract

Cyberattacks cause havoc in the digital world, but the most significant threat might be from those who appear to be trustworthy: insiders. Insider threats pose a significant and evolving challenge to organisations, jeopardizing data security, operational processes, and overall well-being. Unlike external threats, these threats stem from individuals with authorized access and deep familiarity with internal systems, making them particularly difficult to detect and potentially causing more substantial damage. Insiders, including employees, contractors, or business partners, possess legitimate access to a company’s systems and data. When these insiders act maliciously or negligently, they can cause significant damage through theft, sabotage, or espionage. While robust for detecting and preventing insider threats, machine learning and deep learning techniques face several challenges. This thesis aims to highlight three significant challenges in insider threat detection and prediction. A significant challenge in evaluating insider threat detection and prediction algorithms is the lack of standardized datasets and problem settings. This variability makes it difficult to compare the effectiveness of different approaches and provide clear recommendations for decision-makers. To address this challenge, this study aims to objectively evaluate the performance of supervised machine learning algorithms within a consistent experimental setting. This will be achieved by implementing supervised algorithms using the balanced CERT r4.2 dataset, employing a uniform feature extraction methodology. The performance of various supervised machine learning algorithms on a balanced dataset using the same feature extraction method is thoroughly evaluated. Additionally, an exploration of the impact of hyperparameter tuning on performance within the balanced dataset is conducted. The second challenge is, traditionally, detecting insider threats, which involves analyzing user behaviours recorded in logs and developing a binary classifier to differentiate between malicious and non-malicious individuals. However, existing approaches only consider either standalone activities or sequential activities. A novel approach is proposed to enhance the detection of malicious insiders: a bilateral insider threat detection method that harnesses the power of recurrent neural networks and incorporates both standalone and sequential activities. Initially, behavioural characteristics are extracted from log files, representing the standalone activities. Then, RNN models are utilized to capture the features that represent sequential activities. Subsequently, the features obtained from standalone and sequential activities are merged, and a binary classification model is employed to detect insider threats effectively. The experiment findings using the publicly available CERT r4.2 dataset demonstrate that the proposed bilateral insider threat detection approach significantly improves the performance of insider threat detection. The third challenge is that previous research has addressed the challenge by pinpointing malicious actions that have already occurred but they have provided limited assistance in preventing these risks. This research introduces a novel approach based on bidirectional long-term memory networks, aiming to effectively capture and analyse the patterns of individual actions and their sequential dependencies. The focus lies in predicting whether an individual will become a malicious insider in the future based on their daily behavioural records over the preceding several days. The performance of the four supervised learning algorithms on manual features, sequential features, and the ground truth of the day with various combinations is analysed. Additionally, the performance of different RNN models, such as RNN, LSTM, and BiLSTM, in incorporating these features is investigated. Moreover, the performance of different predictive lengths on the ground truth of the day and different embedded lengths for the sequential features is explored. All experiments are conducted on the CERT r4.2 dataset, with experiment results indicating that BiLSTM achieves the highest performance in combining these features. In summary, this research can effectively address three significant challenges in insider threat detection and prediction.

Item type Thesis (PhD thesis)
URI https://vuir.vu.edu.au/id/eprint/48640
Subjects Current > FOR (2020) Classification > 4604 Cybersecurity and privacy
Current > FOR (2020) Classification > 4611 Machine learning
Current > Division/Research > Institute for Sustainable Industries and Liveable Cities
Keywords threat detection; machine learning; algorithms; recurrent neural networks; cyber security; artificial intelligence; deep learning
Download/View statistics View download statistics for this item

Search Google Scholar

Repository staff login