Applying clustering and ensemble clustering approaches to phishing profiling

Full text for this resource is not available from the Research Repository.

Webb, Dean, Yearwood, John, Ma, Liping, Vamplew, Peter, Ofoghi, Bahadorreza and Kelarev, Andrei (2009) Applying clustering and ensemble clustering approaches to phishing profiling. Conferences in Research and Practice in Information Technology, 101. pp. 25-34. ISSN 1445-1336

Abstract

This paper describes a novel approach to profiling phishing emails based on the combination of multi- ple independent clusterings of the email documents. Each clustering is motivated by a natural representa- tion of the emails. A data set of 2048 phishing emails provided by a major Australian financial institution was pre-processed to extract features describing the textual content, hyperlinks and orthographic struc- ture of the emails. Independent clusterings using dif- ferent techniques were performed on each representa- tion, and these clusterings were then ensembled using a variety of consensus functions. This paper concen- trates on using several clustering approaches to de- termine the most likely number of phishing groups and explores ways in which individual and combined results relate. The approach suggests a number of phishing groups and the structure of the approach can aid the development of profiles based on the in- dividual clusters. The actual profiling is not carried out in this paper.

Additional Information

This paper appeared at the Eighth Australasian Data Mining Conference (AusDM 2009), Melbourne, Australia. Conferences in Research and Practice in Information Technology (CRPIT), Vol. 101, Paul J. Kennedy, Kok-Leong Ong and Peter Christen, Ed.

Item type Article
URI https://vuir.vu.edu.au/id/eprint/9716
Official URL http://crpit.com/confpapers/CRPITV101Yearwood.pdf
Subjects Historical > FOR Classification > 1005 Communications Technologies
Historical > Faculty/School/Research Centre/Department > Institute of Sport, Exercise and Active Living (ISEAL)
Keywords ResPubID22603, clustering, phishing, graph partitioning, cluster ensembles, profiling, consensus functions
Citations in Scopus 21 - View on Scopus
Download/View statistics View download statistics for this item

Search Google Scholar

Repository staff login