Applying clustering and ensemble clustering approaches to phishing profiling
Webb, Dean, Yearwood, John, Ma, Liping, Vamplew, Peter, Ofoghi, Bahadorreza and Kelarev, Andrei (2009) Applying clustering and ensemble clustering approaches to phishing profiling. Conferences in Research and Practice in Information Technology, 101. pp. 25-34. ISSN 1445-1336
Abstract
This paper describes a novel approach to profiling phishing emails based on the combination of multi- ple independent clusterings of the email documents. Each clustering is motivated by a natural representa- tion of the emails. A data set of 2048 phishing emails provided by a major Australian financial institution was pre-processed to extract features describing the textual content, hyperlinks and orthographic struc- ture of the emails. Independent clusterings using dif- ferent techniques were performed on each representa- tion, and these clusterings were then ensembled using a variety of consensus functions. This paper concen- trates on using several clustering approaches to de- termine the most likely number of phishing groups and explores ways in which individual and combined results relate. The approach suggests a number of phishing groups and the structure of the approach can aid the development of profiles based on the in- dividual clusters. The actual profiling is not carried out in this paper.
Additional Information | This paper appeared at the Eighth Australasian Data Mining Conference (AusDM 2009), Melbourne, Australia. Conferences in Research and Practice in Information Technology (CRPIT), Vol. 101, Paul J. Kennedy, Kok-Leong Ong and Peter Christen, Ed. |
Item type | Article |
URI | https://vuir.vu.edu.au/id/eprint/9716 |
Official URL | http://crpit.com/confpapers/CRPITV101Yearwood.pdf |
Subjects | Historical > FOR Classification > 1005 Communications Technologies Historical > Faculty/School/Research Centre/Department > Institute of Sport, Exercise and Active Living (ISEAL) |
Keywords | ResPubID22603, clustering, phishing, graph partitioning, cluster ensembles, profiling, consensus functions |
Citations in Scopus | 21 - View on Scopus |
Download/View statistics | View download statistics for this item |