Hierarchical adaptive evolution framework for privacy-preserving data publishing

[thumbnail of s11280-024-01286-z.pdf]
Preview
s11280-024-01286-z.pdf - Published Version (1MB) | Preview
Available under license: Creative Commons Attribution

You, Mingshan ORCID: 0000-0003-0958-528X, Ge, Yong-Feng, Wang, Kate ORCID: 0000-0001-5208-1090, Wang, Hua ORCID: 0000-0002-8465-0996, Cao, Jinli ORCID: 0000-0002-0221-6361 and Kambourakis, Georgios (2024) Hierarchical adaptive evolution framework for privacy-preserving data publishing. World Wide Web, 27 (4). ISSN 1386-145X

Abstract

The growing need for data publication and the escalating concerns regarding data privacy have led to a surge in interest in Privacy-Preserving Data Publishing (PPDP) across research, industry, and government sectors. Despite its significance, PPDP remains a challenging NP-hard problem, particularly when dealing with complex datasets, often rendering traditional traversal search methods inefficient. Evolutionary Algorithms (EAs) have emerged as a promising approach in response to this challenge, but their effectiveness, efficiency, and robustness in PPDP applications still need to be improved. This paper presents a novel Hierarchical Adaptive Evolution Framework (HAEF) that aims to optimize t-closeness anonymization through attribute generalization and record suppression using Genetic Algorithm (GA) and Differential Evolution (DE). To balance GA and DE, the first hierarchy of HAEF employs a GA-prioritized adaptive strategy enhancing exploration search. This combination aims to strike a balance between exploration and exploitation. The second hierarchy employs a random-prioritized adaptive strategy to select distinct mutation strategies, thus leveraging the advantages of various mutation strategies. Performance bencmark tests demonstrate the effectiveness and efficiency of the proposed technique. In 16 test instances, HAEF significantly outperforms traditional depth-first traversal search and exceeds the performance of previous state-of-the-art EAs on most datasets. In terms of overall performance, under the three privacy constraints tested, HAEF outperforms the conventional DFS search by an average of 47.78%, the state-of-the-art GA-based ID-DGA method by an average of 37.38%, and the hybrid GA-DE method by an average of 8.35% in TLEF. Furthermore, ablation experiments confirm the effectiveness of the various strategies within the framework. These findings enhance the efficiency of the data publishing process, ensuring privacy and security and maximizing data availability.

Dimensions Badge

Altmetric Badge

Item type Article
URI https://vuir.vu.edu.au/id/eprint/48718
DOI 10.1007/s11280-024-01286-z
Official URL http://dx.doi.org/10.1007/s11280-024-01286-z
Subjects Current > FOR (2020) Classification > 4604 Cybersecurity and privacy
Current > FOR (2020) Classification > 4605 Data management and data science
Current > Division/Research > Institute for Sustainable Industries and Liveable Cities
Download/View statistics View download statistics for this item

Search Google Scholar

Repository staff login