Data-Driven Evolutionary Algorithm With Perturbation-Based Ensemble Surrogates

—Data-driven evolutionary algorithms (DDEAs) aim to utilize data and surrogates to drive optimization, which is useful and efﬁcient when the objective function of the optimization problem is expensive or difﬁcult to access. However, the performance of DDEAs relies on their surrogate quality and often deteriorates if the amount of available data decreases. To solve these problems, this article proposes a new DDEA framework with perturbation-based ensemble surrogates (DDEA-PES), which contain two efﬁcient mechanisms. The ﬁrst is a diverse surrogate generation method that can generate diverse surrogates through performing data perturbations on the available data. The second is a selective ensemble method that selects some of the prebuilt surrogates to form a ﬁnal ensemble surrogate model. By combining these two mechanisms, the proposed DDEA-PES framework has three advantages, including larger data quantity, better data utilization, and higher surrogate accuracy. To validate the effectiveness of the proposed framework, this article provides both theoretical and experimental analyses. For the experimental comparisons, a speciﬁc DDEA-PES algorithm is developed as an instance by adopting a genetic algorithm as the optimizer and radial basis function neural networks as the base models. The experimental results on widely used benchmarks and an aerodynamic airfoil design real-world optimization problem show that the proposed DDEA-PES algorithm outperforms some state-of-the-art DDEAs. Moreover, when compared with traditional nondata-driven methods, the proposed DDEA-PES algorithm only requires about 2% computational budgets to produce competitive results.


I. INTRODUCTION
I N RECENT years, data-driven evolutionary algorithms (DDEAs) have received increasing attention in solving many real-world optimization problems, such as trauma system optimization [1], air ventilation system design [2], blast furnace optimization [3], and many others [4]. This is mainly due to two reasons. First, evolutionary algorithms (EAs) are efficient tools for tackling optimization problems with different properties and challenges, such as large scale [5]- [7], dynamic [8], multimodal [9]- [11], multiobjective [12]- [14], and many objective [15]. Second, there is an increasing number of real-world optimizations requiring distributed approaches [16], [17] and data-driven approaches [18], because their objective functions (and/or constraints functions) are always expensive, computationally intensive, or time consuming to perform. That is, evaluating the fitness (i.e., quality) of candidate solutions can be unaffordable in such real-world application problems [19]. For example, one evaluation of a high-fidelity crashworthiness analysis in the automotive industry can take several days and, therefore, finishing 10 4 times of evaluations for a crashworthiness design can take more than 100 years [20], which is unrealistic for engineering productions. Instead of using expensive fitness evaluations (FEs) and/or constraints evaluations, data-driven methods can provide cheaper and more efficient ways to carry out the evolutionary optimizations. Specifically, based on some evaluated data (e.g., candidate solutions evaluated by real FEs), data-driven methods can build surrogate models to approximate or replace the real FEs to drive the evolutions, which can reduce the needs on expensive FEs in the optimization procedures. Therefore, by combining EAs and data-driven methods, DDEAs can be more potential and efficient than traditional methods including traditional EAs because DDEAs can drive the optimizations through data and surrogates instead of performing expensive FEs [21], [22].
Generally speaking, existing research for improving DDEAs can be roughly classified into two categories. The first category mainly aims to improve data quality and data quantity, because data with higher quality and of larger amounts are useful for building more accurate surrogates [4]. Therefore, many data processing and data generation methods have been proposed, such as local smooth methods [3], data mining techniques [18], and artificial data generation [23]. The second This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ category is to improve surrogate quality, for example, the accuracy and robustness of surrogates. To obtain better surrogates, the users can select appropriate methods to build suitable surrogates, such as polynomial fitting [24], Kriging model [25], neural networks [26], and many others [27], [28]. In addition, when given a set of prebuilt homogeneous or heterogeneous surrogates, better models can be generated through managing and combining prebuilt surrogates properly [29], [30]. However, some studies also show that ensemble surrogates that are more efficient than a single surrogate in theoretical (mathematical functions) problems may not always work better in real-world application problems because the nature of each optimization problem may favor different surrogates [31]- [34]. Therefore, further and more intelligent surrogate ensemble methods are needed to be researched and studied. Moreover, as DDEAs rely heavily on surrogate predictions to evolve candidate solutions, their optimization accuracy may greatly deteriorate if they cannot make full use of the limited data to generate accurate surrogates.
To solve the above problems, this article proposes a new DDEA framework with perturbation-based ensemble surrogates, called DDEA-PES. The proposed framework contains two efficient mechanisms. First, it employs a diverse surrogate generation (DSG) method to generate a set of diverse surrogates. This is achieved by first performing data perturbations on the given dataset to obtain diverse datasets and second training surrogates on each new dataset independently. Second, it adopts a selective ensemble (SE) method to select some of the prebuilt surrogates to form a final ensemble surrogate model. By combining the DSG and the SE, the proposed framework can have the following three advantages. First, if the given data are insufficient for building accurate surrogates, data perturbations can increase the data quantity for building better surrogates. Second, the algorithm can make full use of the given data by using DSG to generate a large number of diverse datasets from a given dataset and to obtain a set of diverse surrogates. Third, the SE can obtain a final surrogate model with higher effectiveness and efficiency by selecting and combining the better surrogates from prebuilt surrogates. To validate the proposed framework, this article provides theoretical analyses to study the effectiveness of data perturbations. Furthermore, experimental comparisons are performed on benchmark functions and a real-world aerodynamic airfoil design optimization problem to investigate the DDEA-PES. To conduct the experimental comparisons, a specific algorithm is developed based on the proposed framework, which employs genetic algorithm (GA) as the optimizer and radial basis function neural networks (RBFNNs) as the base models, respectively. In addition, state-of-the-art algorithms and traditional methods are adopted as contenders in the experimental comparisons.
The remainder of this article is organized as follows. Section II provides background knowledge and related work, while Section III introduces the DDEA-PES and gives theoretical analyses. Section VI is for the experiments and comparisons. Finally, Section V gives the conclusion and future work.

A. Data-Driven Evolutionary Algorithm
As the complexity and scale of optimization problems increase rapidly, FEs are increasingly expensive and difficult to access. In such cases, the performance of EAs often deteriorates due to the lack of enough FEs for evolving individuals, which poses great challenges to EAs in real-world applications [35]- [37]. Therefore, many EAs have been incorporated with data-driven methods such as surrogate models to improve their performance [38].
Usually, DDEAs mainly aim to drive the evolutionary search based on candidate solutions that have been evaluated by real FEs [39]. Such processes can be achieved by using surrogates to approximate and replace the real FEs as much as possible [18], [21]. In other words, by using evaluated data (i.e., the evaluated solutions) to build suitable surrogates, DDEAs can utilize these surrogates for driving the evolution search. As the data-driven mechanism can reduce the needs of conducting expensive FEs, DDEAs can easily outperform many traditional EAs when solving computationally intensive, time consuming, and expensive optimizations [22].
According to whether new candidate solutions can be evaluated by exact FEs, DDEAs can be divided into two categories [4]: 1) online DDEAs and 2) offline DDEAs. In online DDEAs, some candidate solutions will be still evaluated by real FEs during the optimization procedure. These newly evaluated solutions can be utilized as the new training data to refine and improve the existing surrogate models, which can further increase the surrogate accuracy and search efficiency [39]. Therefore, online DDEAs can be suitable for the problems that few FEs are still available from physical experiments or expensive calculations [40]. Differently, offline DDEAs build surrogate models only based on the historical evaluated data and no new solutions are evaluated by real FEs during the optimization procedure [19]. Although online and offline DDEAs are different in their data collection, they have similar ways of processing collected data and building surrogates. Therefore, methods proposed for processing data and managing surrogates in offline algorithms can be used in online algorithms as well. Based on this, without loss of generality, this article mainly details the proposed framework for offline DDEAs.

B. Related Work
This part reviews some relevant research and discusses the differences between them and our DDEA-PES. As briefly introduced in Section I, research for enhancing DDEAs can be generally classified into two categories as on data and as on surrogate [41].
The first category aims to improve the quality and quantity of evaluated data [23], [41]. As the evaluated data are crucial for obtaining accurate surrogates, both the quality and quantity of evaluated data can have very significant influences on the optimization accuracy of DDEAs [4]. Consequently, many research have been conducted to solve these problems. For poor-quality data, including data with imbalanced distribution [42], incomplete information [43], and noisy [44], preprocessing methods can be helpful [45]. For example, on a blast furnace optimization application, Chugh et al. [3] smoothed the noisy data through a local regression and then built Kriging models as surrogates based on processed data. As for big data applications [46], data mining and some related methods have been adopted to reduce data redundancy for building surrogates and the long calculation time for accessing big data. For instance, Wang et al. [18] used data mining techniques to recognize useful patterns in bid data, which could save about 90% of running time when optimizing a trauma system. In general, the above methods aim to improve data quality rather than the data quantity. Therefore, they may not work well if the amount of available data is insufficient in obtaining high-quality surrogates. Differently, the proposed DDEA-PES can make better use of the limited data through data perturbations, which can alleviate the data shortage problems.
As data shortage is often the largest challenge for approximating fitness functions, some research attempt to solve this problem by generating additional data [4], [12]. For instance, Guo et al. [23] generated artificial data through a loworder polynomial model. Although this method has obtained promising results, the drawback of this method is that the reliability of the generated data may depend heavily on the low-order model. Differently, Wang et al. [19] proposed an SE method, which generated diverse datasets through a bootstrap method. In this way, a number of different surrogates can be trained on these datasets, respectively, and then combined for predictions. The difference between this method and the DDEA-PES is that the bootstrap method obtains datasets by randomly resampling the evaluated data while the DDEA-PES generates datasets by data perturbations. Besides, transfer learning techniques are also likely to be effective for alleviating the data shortage. For instance, Ding et al. [47] transferred knowledge from computationally cheap problems to expensive problems, which can improve surrogate accuracy. However, such knowledge transfers require the shared characteristics or features between the source and target problems. In other words, transfer learning methods are problem dependent. In contrast, the proposed DDEA-PES does not require such assumptions and therefore can be suitable for a wider range of problems.
The second category aims to obtain better surrogates based on given data. So far, many methods have been studied for choosing more suitable methods and models to build surrogates. These models can be polynomial regression [24], Kriging models [25], traditional interpolation methods [48], and many others [49]. Also, machine-learning techniques are popular for building surrogates, which include artificial neural networks [26] and RBFNNs [27], [28]. Moreover, Sun et al. [48] proposed a novel fitness approximation strategy based on PSO, which could estimate fitness according to the positional relationship between particles in a PSO. However, the above research have shown that each model has its own advantages and no surrogate model can be the best for all problems [4]. Therefore, many methods of model management are proposed to combine the advantages of different surrogates. For example, Wang et al. [29] made a better balance between global and local searches by combining global and local surrogates. Similarly, Sun et al. [30] designed a two-layer surrogate-assisted PSO (TLSAPSO) that adopted local surrogates to locate global optimum and employed global surrogates to smooth out local optimum. Also, the surrogateassisted cooperative swarm optimization algorithm (SA-COSO) was proposed with an estimation method and RBFNNs for solving high-dimensional problems [28]. Furthermore, based on a set of surrogates, committee-based active learning for the surrogate-assisted PSO algorithm (CAL-SAPSO) employed the committee-based decision for predictions [39]. As online DDEAs are able to evaluate new data through real FEs and employ them to update surrogates, many methods have been studied for selecting the most suitable individuals for FEs. Based on the mechanism for selecting individuals, related methods can be classified into generation-and individual-based strategies [4]. The generation-based strategies are to evaluate all solutions based on generations, according to the adaptive or predefined settings of the frequency for conducting evaluations [26]. Differently, only some individuals will be evaluated in individual-based strategies. In these strategies, the selections of individuals are often based on two factors: 1) the promising and 2) uncertain individuals [4]. The promising individuals, namely, the individuals with betterpredicted fitness, may provide more useful information to capture the exact optimum position [26], while uncertain individuals, individuals with uncertainty predictions, can provide information to increase surrogate accuracy on uncertain areas [50]. However, how to measure uncertainty is not an easy problem. Hence, some methods are frequently used in many model management strategies to provide uncertainty measurements of predictions, such as the Gaussian process-assisted EA designed for the medium-scale expensive problem (GPEME) [40]. Also, some research measure the uncertainty based on the variance of surrogate outputs [39], [50]. In addition, a branch of strategies called infill criteria is researched to consider both the prediction fitness and uncertainty together to combine their advantages, including expected lower confidence bound (LCB) [40], probability of improvement (PoI) [51], and expected improvement (ExI) [52]. Furthermore, based on this, multiobjective infill criteria have also shown effectiveness when minimizing fitness and uncertainty together [21]. In summary, the above methods mainly consider how to manage and update surrogate models based on given data by employing techniques in data analysis and knowledge discovery. Different from these existing methods, the DDEA-PES provides a novel and efficient way for model building and management, which can adaptively select the proper surrogate ensemble according to the problem at hand.

A. Overall Framework
The overall framework of DDEA-PES is shown in Fig. 1. As mentioned before that the model management methods for offline algorithms can also be used in online algorithms [19], for simplicity, Fig. 1 is the version of offline DDEA-PES, which denotes all data evaluated by real fitness functions as original data.
In general, the DDEA-PES can be divided into two parts: 1) the employed EA and 2) the employed surrogate model. The employed EA in DDEA-PES is the same as traditional EAs, which includes initialization, variation, FE, and selection. Hence, DDEA-PES can use different EAs as the optimizers, such as PSO and GA.
The surrogate model employed in DDEA-PES is the perturbation-based ensemble surrogates (PES) proposed in this article. Given the original data, PES first performs the DSG based on data perturbations to generate a set of surrogate models. Then, it employs the SE to select some of the existing surrogates to form the final ensemble model. This ensemble model will be used to replace the real FEs in the selection part of the employed EA. Driven by the ensemble model, the employed EA iteratively evolves its individuals and finally outputs the solutions when the stop criteria are met. The data perturbation, DSG, and SE will be detailed in the following contents one by one.

B. Data Perturbation
The data perturbation aims to generate diverse datasets by perturbating the original data in order to build different surrogates for ensemble selections. To better describe the data perturbations, we use a similar method as in [41] and the notations used in the following contents are introduced herein. First, this article denotes the data evaluated by real FEs and the data generated by data perturbations as "original data" and "generated data," respectively. Second, we denote the training dataset (TS) containing all the original data with the corresponding fitness as where N is the total number of the original data and F(x) is the fitness value of x. Third, a subset of TS is denoted as S that contains the selected data for data perturbations. Based on these notations, the generated dataset K generated by data perturbations can be represented as follows: where x is a random vector, l is the maximum length of x, D is the problem dimension, and UB i and LB i are the upper bound and lower bound of the ith dimension, respectively. Then, the diverse training set (DTS) can be represented as the union of K and TS Notice that if we have sufficient small l in (1), the fitness of x gen and x s can be nearly the same for continuous fitness functions. Therefore, we set the fitness value of x gen the same as x s , that is, where F(x gen ) and F(x s ) will be referred as the exact fitness and the approximated fitness of x gen , respectively, in the rest of this article. In this way, we can obtain the additional data (i.e., x gen ) without consuming FEs. Now, we analyze the effect of such data perturbations. Considering a surrogate model M and denoting its prediction on data x as M(x), its prediction error can be defined as where L is a loss function. For simplicity, we consider the absolute loss function in this part, that is, L(a, b) = |a−b| with a and b as real numbers. Then, given x gen , which is generated by perturbating x s , and its approximate fitness F(x s ), we have where Err appr and Err exact represent the prediction error when the fitness of data is approximated and is exact, respectively. For clarity, the following contents simply denote x gen as x when it belongs to K. Then, given a dataset K which is generated through (1) with the data x s in S, the expected error produced by M on K can be defined as where E appr (M, F, K) represents the expected prediction error of M on K when the fitness of all data x in K is approximated, and p(x) is the distribution of data x. Combining (5) and (6), we can have where E exact (M, F, K) represents the expected prediction error of M on K when the fitness of all data x in K is exact. As the data in TS rarely will be the same with others, we can simply assume that the value of p(x) for all data is the same on a set K. That is, data are uniformly distributed on K. Then, (6) can be rewritten as where |K| is the size of dataset K. Furthermore, according to (1), as the data in S are finite, we can always find a constant Q that can satisfy (9) for any data x s in S and for the corresponding Combining (8) and (9), we can have The inequation given in (10) can provide two implications. First, when approximating continuous objective functions [which can have a small Q to satisfy (9)], data perturbations can help obtain high accuracy surrogates within less FEs because the training error of surrogates based on data with approximated fitness can be similar to those based on exact fitness. For an extreme example, when the approximated functions are constant functions (Q can be zero), the surrogates trained on data with approximated fitness will be the same as those on exact fitness. Second, by properly controlling l, the training error obtained by surrogates based on data with approximated fitness can be similar to those based on exact fitness. This provides a cost-effective way of building surrogates. Therefore, the surrogate trained on datasets after perturbations can be similar to those on datasets with real fitness, where the former requires less FEs than the latter.

C. Diverse Surrogate Generation
Based on the data perturbations, diverse surrogates can be generated independently, as shown in Fig. 2. Algorithm 1 is the pseudocode of DSG. As minimization problems can be easily transformed into maximization problems, Algorithm 1 is for minimization problems without loss of generality. The inputs of DSG include a set of original data, TS, and the total surrogate number to be obtained, T, while the output is a surrogate model set (SMS) with T diverse surrogates. In the implementation, T surrogates are all RBFNNs [19] so that the algorithms can simply receive their parameters (i.e., the number and weights of neurons) and rebuild the same RBFNNs if needed. Furthermore, many research have shown that in both theory and practice, the linear combination of highly nonlinear models can dramatically decrease the variance of generation error [53], and RBFNNs are also nonlinear models.
In DSG, there are two main procedures. In the first procedure, some data of TS will be selected as S in (1). To obtain S, DSG builds the first surrogate M 1 on TS and then predicts the fitness of all data in TS as Y pre . After this, it computes the prediction error, diff = Y pre − F(x), for all the data. The larger the diff is, the large the prediction error is. Thus, data with larger diff will be selected to construct S for data generation. Herein, DSG will add the first half of data with larger diff

Algorithm 2: SE
Input: x best -the best solution in the evaluated data set; F(x best )-the fitness value of the x best ; T-the number of surrogate models to be selected; OSMS-the original surrogate model set containing more than T surrogates. Output: SMS-the surrogate model set containing T surrogates. Select the first T surrogates in OSMS to construct SMS; 8 End into S. The reason for using "a half" lies in that too much will make the algorithm time consuming while too few data may be insufficient, and hence "a half" makes the balance. The second procedure of DSG is to generate DTS through data perturbations and then build surrogates based on DTS. For building each surrogate, DSG first sets K as an empty set. Then, it generates data by performing perturbations on each data in S and adds the generated data into K. After this, the DSG obtains a new dataset DST as the union of K and TS and employs the DST to build a new surrogate model. Finally, the built surrogate model will be added to SMS. The above processes will be performed iteratively until T surrogates are generated. Last but not least, as the first surrogate built only on TS is also added into the SMS, the total number of new surrogates built on different DSTs is T − 1.

D. Selective Ensemble
In the literature, the ensembles of surrogate models or ensemble learning methods have shown effectiveness in improving DDEAs [19], [50]. Therefore, this work also considers further enhancing the approximation accuracy through the combination of different surrogates. In this article, a straightforward SE method is proposed. That is, from a set of surrogates, the algorithm just selects some of them for use. The pseudocode of SE is shown as Algorithm 2. The inputs of SE include the best solution in the current generation (x best ), the number of surrogates to be selected (T), and the original model set containing more than T surrogates, while its output is an SMS with T surrogates. The idea of SE is to select the surrogates with better accuracy. As DDEAs care about the surrogate accuracy on promising area rather than the unpromising area, the SE employs the best solution to measure the accuracy of different surrogates. Therefore, SE first computes the prediction value given by each surrogate in OSMS on the best solution and then selects the surrogates with smaller prediction error. Although the algorithm can perform SE and reselect the surrogates during optimizations, how to design the frequency for performing SE can be problem dependent. Therefore, the SE used in this article is performed before the optimizations and its selection of surrogates is fixed during the whole optimization procedure. That is, we use the best solution in historical data to preselect the surrogate before the evolutionary search starts. When evaluating a new candidate solution, the average of prediction results given by all the selected surrogates will be calculated as the predicted fitness of that candidate solution.

A. Algorithm Settings
To test the proposed framework, we develop a specific algorithm for the experiments and companions, which adopt the GA with simulated binary crossover (GA-SBX) [54] and RBFNNs as the employed EA and the base models in the DDEA-PES framework, respectively. For configurations, all the RBFNNs are configured with D neurons, where D is the problem dimension. In addition, the maximum generation for performing GA-SBX is set as 500 and the number of surrogates built by DSG and the surrogates selected by SE is set as 200 and 100, respectively.
For comparisons, state-of-the-art algorithms are adopted as competitors: CAL-SAPSO [39], SA-COSO [28], GPEME [40], DDEA-SE [19], and MGP-SLPSO [21]. These algorithms have different characteristics. First, CAL-SAPSO that employs surrogates to make committee-based decisions shares some similar characteristics with DDEA-PES in the model combination. Second, GPEME is an efficient online DDEAs using Kriging models. Third, different from CAL-SAPSO and GPEME that are designed for small-and medium-scale optimizations, SA-COSO and MGP-SLPSO are online DDEAs for high-dimensional problems [28]. Fourth, DDEA-SE is a powerful offline DDEA, which can help to show the effectiveness of our DDEA-PES. All configurations of these DDEAs are set according to their corresponding references. In addition, the GA-SBX used in DDEA-PES is configured as the same with that used in DDEA-SE [19], which can provide more fair comparisons. Specifically, the crossover parameter, mutation parameter, and population size of GA-SBX are 1.0 and 1/D, and 100, respectively, with D as the dimension of corresponding problems.

B. Experimental Setup
The experiments first employ commonly used benchmark functions [19] to test the proposed algorithm, where the benchmarks are presented in Table I. Although these benchmark problems seem to be simple, they cover a wide range of problem characteristics (e.g., multimodal and nonseparable) for observing the features of different optimization algorithms. Furthermore, these problems can be extremely difficult to optimize when the number of available FEs is limited. As for the experiments, 11 × D is the maximum number of available FEs for each algorithm, where D is the corresponding problem dimension. Especially for offline DDEAs, 11 × D data are sampled by Latin hypercube sampling (LHS) [55] before their evolution and after this, no real FEs will be performed. In order to reduce the statistical errors, the average results over 25 independent runs are used for comparisons. For the purpose of clearer comparisons, both the average and standard deviation of the optimization error are presented. Besides, Wilcoxon's rank-sum tests with a significant level α = 0.05 are performed to make the comparisons statistically sound, the proposed algorithm performs significantly better than, similar to, and significantly worse than the DDEA-PES which are represented by the symbols "+," "≈," and "−," respectively.

C. Effectiveness of DDEA-PES
This part accesses the effectiveness of data-driven methods in DDEA-PES. In the experiments, the proposed methods are compared with GA-SBX (without surrogates) and random sample method (11·D data sampled by LHS). The GA-SBX and DDEA-PES are only different in that GA-SBX only uses real FEs for evolution while DDEA-PES only employs surrogates. In addition, the random sample method is actually the offline data utilized in DDEA-PES. This can serve as a baseline to compare the DDEA-PES. Besides, to observe the strength of DDEA-PES, the GA-SBX using 110 × D and 550 × D FEs is also employed for comparisons, where the DDEA-PES only consumes 11×D FEs to obtain offline data.
The comparison results are provided in Table II, with the best results marked in bold. Table II suggests the effectiveness of DDEA-PES. Based on Wilcoxon's rank-sum tests, the DDEA-PES outperforms both the random sample method and the GA-SBX (with 11D Fes) on all the problems. Furthermore, the DDEA-PES can perform generally better than and similarly to the GA-SBX with 110D FEs and 550D FEs, respectively. That is, it can use about 10% or 2% budgets of FEs to generate better or similar results when compared with GA-SBX without surrogates. This has shown the effectiveness of DDEA-PES.

D. Comparisons With Offline DDEAs
This part compares the DDEA-PES with DDEA-SE on all the benchmark problems, of which results are provided in Table III. It can be seen that although DDEA-SE is a stateof-the-art offline DDEA, the DDEA-PES can perform better

E. Comparisons With Online DDEAs
In this part, DDEA-PES is compared with state-of-theart online DDEAs. Considering that online algorithms are proposed for different problems, that is, CAL-SAPSO and GPEME for low-and medium-dimensional problems while SA-COSO and MGP-SLPSO for high-dimensional problems, we divided the comparisons into two parts, problems with 10 and 30 dimensions and problems with 50 and 100 dimensions. Hence, the CAL-SAPSO and GPEME are adopted in 10-and 30-D problems while the SA-COSO and MGP-SLPSO are employed in 50-and 100-D problems. Table IV presents the comparison results on low and medium problems. When compared with GPEME, the DDEA-PES shows its strengths by significantly outperforming GPEME on all the ten benchmark problems. When compared with CAL-SAPSO, the DDEA-PES can still obtain the best results (in bold) on five of the ten problems. As the DDEA-PES is implemented in an offline version, it can have better performance when implemented in an online version which can evaluate candidate solutions to update surrogate models during the optimization process. In short, the comparisons with CAL-SAPSO and GPEME have shown the potential of the DDEA-PES. Table V presents the comparison results on highdimensional problems. The results show that the DDEA-PES significantly outperforms SA-COSO on nine compared problems and is only significantly worse on one problem, which suggests the good performance of DDEA-PES on highdimensional problems. When compared with MGP-SLPSO,

F. Contribution Analysis of Different Components
This part further studies the contributions and influences of the proposed DSG and SE individually. For this, DDEA-PES is compared with its variants without DSG, SE, or both of them. According to their components, these variants are denoted as DDEA-PES without DSG (DDEA-PES-without-DSG), DDEA-PES without SE (DDEA-PES-without-SE), and DDEA-PES without both DSG and PES (DDEA-PES-without-DSG-SE). All these variants use the same optimizer as that used in DDEA-PES.
The comparison results are provided in Table VI. Based on Wilcoxon's rank-sum test, the original DDEA-PES significantly outperforms DDEA-PES-without-DSG, DDEA-PESwithout-SE, and DDEA-PES-without-DSG-SE on 13, 16, and 20 problems of the 20 tested problems, respectively. Moreover, over the 20 problems, DDEA-PES can obtain the best results (as marked in bold) on 13 problems. These results show that both the DSG and SE have their contributions to the great performance of DDEA-PES and removing any of them will decrease the algorithm performance.

G. Effects of Different Area Sizes for Data Perturbation
To further study the effect of different area sizes for data perturbations, DDEA-PES is compared with its variants with different l for perturbations, where the original l is   (1)]. The comparison results provided in Table VII show that different l can be suitable for different problems and different dimensions. For example, DDEA-PES with l = 10 2 ·l 0 outperforms l = 10 −2 ·l 0 on the 10-D Ellipsoid problem while it performs worse than l = 10 −2 · l 0 on the 100-D Ellipsoid problem. Also, l = 10 2 · l 0 performs better and worse than l = 10 4 · l 0 on the 10-D Ellipsoid problem and 100-D Rosenbrock problem, respectively. Furthermore, as l = l 0 can, in general, outperform other l values, l is recommended to be configured according to (2). In summary, the proposed data perturbation can improve the optimization accuracy of DDEAs.

H. Effects of Different Criteria for Constructing Dataset
This part considers the effects of different criteria for constructing the dataset S (refer to Algorithm 1). As the employed criterion of the original DDEA-PES is diff = Y pre − F(x), its variants with diff = Y pre − F(x) 2 , diff = F(x), and diff = −F(x) are employed for comparisons. The results provided in Table VIII show that the original diff outperforms others. Based on Wilcoxon's rank-sum tests on the 20 problems, the original diff can perform significantly better than diff = Y pre − F(x) 2 , diff = F(x), and diff = −F(x) on 17, 14, and 20 problems, respectively. Furthermore, the original diff obtains the best average results (marked in bold) on 11 of the 20 problems. This indicates that selecting which data to perform perturbations can be crucial to the surrogate accuracy, and in general, the selection criterion employed in this article is effective and efficient.

I. Effects of Selected Surrogate Number
This part studies the effects of the selected surrogate number for building the final ensemble model. For this, we compared the performance of DDEA-PES with different numbers of selected surrogates, including 5, 10, 15, 20, 50, 100 (the original setting), and 150, on 10-and 100-D Ellipsoid, Ackley, and Rastrigin problems, respectively. The comparison results are provided in Fig. 3, which indicates that the influence of the selected surrogate number has a strong relationship with the problem dimension. For example, 150 surrogates have similar average results with ten surrogates on the 10-D Ellipsoid problem, while 150 surrogates can outperform ten surrogates on the 100-D Ellipsoid problem. Similarly, on the Ackley problem, selecting ten surrogates is better for the 10-D case while selecting 100 surrogates is recommended for the 100-D case. These results may be due to the reason that the problem complexity increases rapidly as the dimension increases. In such circumstances, the number of selected surrogates enough for problems of low dimensions may be insufficient for problems of high dimensions. Therefore, it is suggested to use more surrogates when approximating higher dimensional and more complex problems.

J. Aerodynamic Airfoil Design Optimization
This part employs an aerodynamic airfoil design optimization problem to test the performance of DDEA-PES in real-world applications [56]. As predicting the performance of a candidate airfoil shape design requires time-consuming computational fluid dynamics simulations, it is ideal to employ DDEAs to solve the aerodynamic airfoil design optimization problem.
The objective of the problem is to optimize the geometry of the airfoil to maximize the lift-over-drag ratio at predefined transonic flow situations. The airfoil geometry can be defined by the base airfoil case and ten additional controlling variables of the Hicks-Henne bump functions [57], [58]. Therefore, the problem can be simply formulated as where C L/D is the fitness to denote the lift-over-drag ratio, θ 1 , θ 2 , . . . , θ 10 are the ten decision variables, Hicks() represents the Hicks-Henne bump function set, and the predefined parameters M, Re, and AoA are the Mach number, Reynolds number, and the Angle of Attack, respectively [57], [58].
In the experiments, the NACA 0012 airfoil [59] is adopted as the base case and the parameters M, Re, and AoA are set as 0.5, 5 × 10 6 , and 4 • , respectively, in (11). The fitness value C L/D of each airfoil design is obtained by the simulation in the software Xfoil [60]. For this 10-D problem, the maximum number of evaluations is 110 (i.e., 11 × D) for all algorithms, where offline DDEAs will sample 110 data before the optimization while online DDEAs spend 50 evaluations (i.e., 5 × D) for initial sampling if their algorithms do not have specific settings. All algorithms are run 25 times and the average results are provided for comparisons. In addition, as SA-COSO and MGP-SLPSO are developed for high-dimensional problems, they are not adopted in this part. Instead, in this comparison, we additionally add three online  DDEAs that have been examined on real-world problems. They are EAS-SM3, EAS-SM5, and EAS-SM12, which are the best three algorithms among 12 algorithms on a realworld application optimization [32]. Both the EAS-SM3 and EAS-SM5 employ a single surrogate, respectively, based on a cubic RBFNN and a Kriging model with Gaussian correlation and first-order polynomial, while the EAS-SM12 adopts the ensemble surrogates with optimal weights. The experimental results are provided in Table IX. Among the tested algorithms, the proposed DDEA-PES produced the best C L/D as 105.03 and obtained the best performance in terms of the median and mean results, showing the effectiveness of DDEA-PES. Moreover, the geometry and pressure coefficient of the original NACA 0012 and the best-optimized airfoil obtained by DDEA-PES are plotted in Fig. 4 for visualizations. It can be seen that our optimized airfoil can have a smoother change of pressure coefficient (e.g., when  x = 0.05) than the original airfoil, which can lead to a considerable improvement of airfoil performance. Overall, the above results have verified the effectiveness of DDEA-PES in the real-world airfoil design application problem.

V. CONCLUSION
In this article, a novel and efficient framework called DDEA-PES is proposed with two efficient mechanisms. The first is the DSG that can generate diverse surrogates based on a limited dataset, which can make better use of the limited data and improve surrogate qualities. The second is the SE that selects some of the prebuilt surrogates to form the final ensemble model in order to further improve the model's effectiveness and efficiency.
For future work, the proposed algorithm will be extended to more complicated real-world application problems. Moreover, other types of surrogates will be attempted as the base model to further improve the algorithm accuracy and efficiency. In addition, other methods such as semisupervised regression will be studied to better deal with the data shortage problems when building surrogates.