1. Introduction
Extreme learning machine (ELM) was proposed as a promising learning algorithm for single-hidden-layer feedforward neural networks (SLFN) by Huang [1–3], which randomly chooses weights and biases for hidden nodes and analytically determines the output-layer weights by using Moore-Penrose (MP) generalized inverse [4]. Due to avoiding the iterative parameter adjustment and time-consuming weight updating, ELM obtains an extremely fast learning speed and thus attracts a lot of attention. However, random initialization of input-layer weights and hidden biases might generate some suboptimal parameters, which have negative impact on its generalization performance and predicted robustness.
To alleviate such weakness, many works have been proposed to further improve the generalization capability and stability of ELM, where ELM ensemble algorithms are the representative ones. Three representative ELM ensemble algorithms are summarized as follows. The earliest ensemble based ELM (EN-ELM) method was presented by Liu and Wang in [5]. EN-ELM introduced the cross-validation scheme into its training phase, where the original training dataset was partitioned into
As for ensemble of the traditional neural networks, the most prevailing approaches are Bagging and Boosting. In Bagging scheme [14], it generates several training datasets from the original training dataset and then trains a component neural network from each of those training datasets. Boosting mechanism [15] generates a series of component neural networks whose training datasets are determined by the performance of former ones. There are also many other approaches for training the component neural networks. Hampshire [16] utilizes different object functions to train distinct component neural networks. Xu et al. [17] introduce the stochastic gradient boosting ensemble scheme to bioinformatics applications. Yao et al. [18] regard all the individuals in an evolved population of neural networks as component networks.
In this paper, a new ELM ensemble scheme called Stochastic Gradient Boosting-based Extreme Learning Machine (SGB-ELM) which makes use of the mechanism of stochastic gradient boosting [19, 20] is proposed. SGB-ELM constructs an ensemble model by training a sequence of ELMs where the output weights of each individual ELM is learned by optimizing the regularized objective in an additive manner. More specifically, we design an objective based on the training mechanism of boosting method. In order to alleviate overfitting, we introduce a regularization item which controls the complexity of our ensemble model to the objective function concurrently. Then the derivation formula aimed at solving output weights of the newly added ELM is determined by optimizing the objective using second-order approximation. As the output weights of the newly added ELM at each iteration are hard to be analytically calculated based on the derivation formula, we take the output weights learned by the pseudo-residuals-based training dataset as an initial heuristic item and thus obtain the optimal output weights by using the derivation formula to update the heuristic item iteratively. Because the regularized objective tends to employ not only predictive but also simple functions and meanwhile a randomly selected subset rather than the whole training set is used to minimize training residuals at each iteration, SGB-ELM can continually improve the generalization capability of ELM while effectively avoiding overfitting. The experimental results in comparison with Bagging ELM, Boosting ELM, EN-ELM, and V-ELM show that SGB-ELM obtains better classification and regression performances, which demonstrates the feasibility and effectiveness of SGB-ELM algorithm.
The rest of this paper is organized as follows. In Section 2, we briefly summarize the basic ELM model as well as the stochastic gradient boosting method. Section 3 introduces our proposed SGB-ELM algorithm. Experimental results are presented in Section 4. Finally, we conclude this paper and make some discussions in Section 5.
2. Preliminaries
In this section, we briefly review the principles of basic ELM model and the stochastic gradient boosting method to provide necessary backgrounds for the development of SGB-ELM algorithm in Section 3.
2.1. Extreme Learning Machine
ELM is a special learning algorithm for SLFN, which randomly selects weights (linking the input layer to the hidden layer) and biases for hidden nodes and analytically determines the output weights (linking the hidden layer to the output layer) by using MP generalized inverse. Suppose we have a training dataset with
Due to avoiding the iterative adjustment to input-layer weights and hidden biases, ELM’s training speed can be thousands of times faster than those of traditional gradient-based learning algorithms [2]. At the meantime, ELM also produces good generalization performance. It has been verified that ELM can achieve the equal generalization performance with the typical Support Vector Machine algorithm [3].
2.2. Stochastic Gradient Boosting
Stochastic gradient boosting scheme was proposed by Friedman in [20], and it is a variant of the gradient boosting method presented in [19]. Given a training set
As gradient boosting constructs additive ensemble model by sequentially fitting a weak individual learner to the current pseudo-residuals of whole training dataset at each iteration, it costs much training time and may suffer from overfitting problem. In view of that, a minor modification named stochastic gradient boosting is proposed to incorporate some randomization to the procedure. Specifically, at each iteration a randomly selected subset instead of the full training dataset is used to fit the individual learner and compute the model update for the current iteration. Namely, let
Stochastic gradient boosting is also considered as a special linear search optimization algorithm, which makes the newly added individual learner fit the fastest descent direction of partial training loss at each learning step.
3. Stochastic Gradient Boosting-Based Extreme Learning Machine (SGB-ELM)
SGB-ELM is a novel hybrid learning algorithm, which introduces the stochastic gradient boosting method into ELM ensemble procedure. As boosting mechanism focuses on gradually reducing the training residuals at each iteration and ELM is a special multiparameters network (for classification tasks particularly), instead of combining the ELM and stochastic gradient boosting primitively, we design an enhanced training scheme to alleviate possible overfitting in our proposed SGB-ELM algorithm. The detailed implementation of SGB-ELM is presented in Algorithm 2, where the determination of optimal output weights for each individual ELM learner is illustrated in Algorithm 1 accordingly.
Algorithm 1: The determination of
Input:
Output:
The optimal output weights
Algorithm 2: SGB-ELM.
Input:
input weights
within the range of
determined analytically by
base learner
and then a stochastic subset of the whole training dataset is defined
as
with regard to the predicted output of the current ensemble model
for each training instance in the subset as
with regard to the predicted output of current ensemble model for each
training instance in the subset as
pseudo residuals
for the derivation formula based on the modified training
dataset
where
input weights
output-layer weights
the current ensemble learning model as
output:
The final ensemble model
There are many existing second-order approximation methods including sequential quadratic programming (SQP) [21] and majorization-minimization algorithm (MM) [22]. SQP is an effective method for nonlinearly constrained optimization by solving quadratic subproblems. MM aims to optimize the local alternative objective which is easier to solve in comparison with the original cost function. Instead of using second-order approximation directly, SGB-ELM designs an optimization criterion for the output-layer weights of each individual ELM. In view of that, quadratic approximation is merely employed as an optimization tool in SGB-ELM.
In SGB-ELM, the key issue is to determine the optimal output-layer weights of each weak individual ELM, which is expected to further decrease the training loss and meanwhile keep a simple network structure. Consequently, we design a learning objective considering not only the fitting ability for training instances but also the complexity of our ensemble model as follows:
As for boosting training mechanism, each individual ELM is greedily added to the current ensemble model sequentially so that it can most improve our model according to (8). Specifically, let
In Algorithm 2, all the input weights and hidden biases of individual ELMs are randomly chosen within the range of
4. Performance Validation
In this section, a series of experiments are conducted to validate the feasibility and effectiveness of our proposed SGB-ELM algorithm, and meanwhile we compare the generalization performance and predicted stability of several typical ensemble learning methods (EN-ELM [5], V-ELM [6], Bagging [14], and Adaboost [15]) on 4 KEEL [25] regression and 5 UCI [26] classification datasets. Among all the above-mentioned ensemble methods, the basic ELM model proposed in [2] is used as the individual learner, where the sigmoid function
4.1. Performance Evaluation of SGB-ELM
For regression problem, the performances of SGB-ELM and other comparative algorithms are both measured by Root Mean Square Error (RMSE), which reveals the difference between the predicted value and the target. Additionally, in this paper, we take the squared loss as our loss function in SGB-ELM algorithm for regression task. Suppose
The performances of the traditional ELM, simple ensemble ELM, Bagging ELM, Adaboost ELM, and our proposed SGB-ELM are compared on 4 representative regression datasets, which are selected from the KEEL [25] repository. Experimentally, all the inputs of each dataset are normalized into the range of
Table 1
Details of 4 KEEL regression datasets.
No. | Datasets | Condition attributes | Training samples | Testing samples |
---|---|---|---|---|
1 | Laser | 4 | 695 | 298 |
2 | Friedman | 5 | 840 | 360 |
3 | Mortgage | 15 | 734 | 315 |
4 | Wizmir | 9 | 1023 | 438 |
Table 2
The comparison results between SGB-ELM and other representative algorithms on 4 regression datasets.
Dataset | Algorithm | Training time | Training RMSE (Dev) | Testing RMSE (Dev) | Hidden nodes | Iterations |
---|---|---|---|---|---|---|
Laser | ELM | 0.0097 | 12.3791 | 12.7783 | 80 | N/A |
Simple ensemble | 0.1547 | 12.1216 | 12.8794 | 80 | 10 | |
Bagging | 0.7991 | 12.1707 | 13.4085 | 80 | 50 | |
Adaboost | 0.1591 | 11.4460 | 12.0666 | 50 | Max = 50 | |
SGB-ELM | 3.4853 | 7.6354 | 8.4170 | 50 ( | 50 | |
| ||||||
Friedman | ELM | 0.0141 | 1.4220 | 1.5124 | 100 | N/A |
Simple ensemble | 0.2081 | 1.4005 | 1.4791 | 100 | 10 | |
Bagging | 1.0144 | 1.4111 | 1.4906 | 100 | 50 | |
Adaboost | 0.5219 | 1.2551 | 1.3342 | 60 | Max = 50 | |
SGB-ELM | 4.8853 | 1.0627 | 1.1581 | 60 ( | 50 | |
| ||||||
Mortgage | ELM | 0.0200 | 0.0855 | 0.0961 | 150 | N/A |
Simple ensemble | 0.3044 | 0.0843 | 0.0947 | 150 | 10 | |
Bagging | 1.4544 | 0.0834 | 0.0937 | 150 | 50 | |
Adaboost | 0.4778 | 0.0785 | 0.0885 | 80 | Max = 50 | |
SGB-ELM | 6.2434 | 0.0607 | 0.0759 | 80 ( | 50 | |
| ||||||
Wizmir | ELM | 0.0128 | 1.0906 | 1.1263 | 100 | N/A |
Simple ensemble | 0.2066 | 1.0869 | 1.1203 | 100 | 10 | |
Bagging | 1.0366 | 1.0859 | 1.1165 | 100 | 50 | |
Adaboost | 0.4331 | 1.0622 | 1.1091 | 60 | Max = 50 | |
SGB-ELM | 5.6525 | 1.0148 | 1.1032 | 60 ( | 50 |
As for classification problem, like other typical feedforward neural networks (for instance, BP neural networks [28]), SGB-ELM evaluates the predicted output by calculating the sum of squared errors. Specifically, let
Similarly, we select 5 popular classification datasets from the UCI Machine Learning Repository [26] to verify the performance of our proposed SGB-ELM algorithm. For each dataset, all the decision attributes are encoded by One-Hot scheme [29]. The characteristics of these datasets are described in Table 3, where each original data set is equally divided into two groups including a training set (
Table 3
Details of 5 UCI classification datasets.
No. | Datasets | Condition attributes | Decision attributes | Training samples | Testing samples |
---|---|---|---|---|---|
1 | Image segmentation | 19 | 7 | 1155 | 1155 |
2 | Texture | 40 | 11 | 2750 | 2750 |
3 | Spambase | 57 | 2 | 2295 | 2294 |
4 | Banana | 2 | 2 | 2650 | 2650 |
5 | Ring | 20 | 2 | 3700 | 3700 |
Table 4
The comparison results between SGB-ELM and other representative algorithms on 5 classification datasets.
Dataset | Algorithm | Training time | Training accuracy (Dev) | Testing accuracy (Dev) | Hidden nodes | Iterations |
---|---|---|---|---|---|---|
Segmentation | ELM | 0.0431 | 0.9465 | 0.9351 | 180 | N/A |
V-ELM | 0.4487 | 0.9463 | 0.9374 | 180 | 7 | |
EN-ELM | 43.9234 | 0.9472 | 0.9353 | 180 ( | 50 | |
Bagging | 3.1853 | 0.9474 | 0.9353 | 180 | 50 | |
Adaboost | 3.5372 | 0.9853 | 0.9466 | 100 | Max = 100 | |
SGB-ELM | 134.2969 | 0.9761 | 0.9558 | 100 ( | 100 | |
| ||||||
Texture | ELM | 0.0338 | 0.9954 | 0.9945 | 100 | N/A |
V-ELM | 0.4275 | 0.9965 | 0.9950 | 100 | 7 | |
EN-ELM | 44.4969 | 0.9963 | 0.9946 | 100 ( | 50 | |
Bagging | 3.0959 | 0.9965 | 0.9957 | 100 | 50 | |
Adaboost | 10.3628 | 0.9996 | 0.9972 | 60 | Max = 100 | |
SGB-ELM | 193.2019 | 0.9992 | 0.9982 | 60 ( | 100 | |
| ||||||
Spambase | ELM | 0.0459 | 0.9174 | 0.9080 | 150 | N/A |
V-ELM | 0.4213 | 0.9192 | 0.9115 | 150 | 7 | |
EN-ELM | 62.6000 | 0.9183 | 0.9071 | 150 ( | 50 | |
Bagging | 2.9869 | 0.9219 | 0.9145 | 150 | 50 | |
Adaboost | 7.6875 | 0.9620 | 0.9234 | 100 | Max = 100 | |
SGB-ELM | 129.0922 | 0.9522 | 0.9222 | 100 ( | 100 | |
| ||||||
Banana | ELM | 0.0550 | 0.6838 | 0.6787 | 180 | N/A |
V-ELM | 0.4906 | 0.6860 | 0.6848 | 180 | 7 | |
EN-ELM | 67.5578 | 0.6821 | 0.6780 | 180 ( | 50 | |
Bagging | 3.3253 | 0.6808 | 0.6777 | 180 | 50 | |
Adaboost | 7.5100 | 0.7457 | 0.7448 | 100 | Max = 100 | |
SGB-ELM | 133.0791 | 0.7610 | 0.7563 | 100 ( | 100 | |
| ||||||
Ring | ELM | 0.0897 | 0.9492 | 0.9418 | 200 | N/A |
V-ELM | 0.7609 | 0.9532 | 0.9466 | 200 | 7 | |
EN-ELM | 114.0641 | 0.9517 | 0.9418 | 200 ( | 50 | |
Bagging | 5.5241 | 0.9539 | 0.9468 | 200 | 50 | |
Adaboost | 17.2109 | 0.9940 | 0.9524 | 150 | Max = 100 | |
SGB-ELM | 363.7976 | 0.9750 | 0.9567 | 150 ( | 100 |
[figures omitted; refer to PDF]
Tables 2 and 4 present the comparison results including training time, training RMSE/accuracy, and testing RMSE/accuracy for regression and classification tasks, respectively. It is shown that SGB-ELM obtains the better generalization capability in most cases without significantly increasing the training time. At the same time, SGB-ELM tends to have smaller training Dev and testing Dev than those of the comparative learning algorithms, which exactly validates the robustness and stability of our proposed SGB-ELM Algorithm. In particular, since SGB-ELM adopts the similar training mechanism with Adaboost which integrates multiple weak individual learners sequentially, the number of hidden nodes is set as a smaller value in both SGB-ELM and Adaboost method. It is worth noting that SGB-ELM can achieve better performance than the existing methods with less hidden nodes and outperforms Adaboost with the same number of hidden nodes.
From Figures 1 and 2, we can find that SGB-ELM is more stable than the traditional ELM, simple ensemble, Bagging, and Adaboost.R2 in regression problem and also produces better robustness than V-ELM, EN-ELM, Bagging, and Adaboost.SAMME in classification problem. It is shown that SGB-ELM not only focuses on reducing the predicted bias as other boosting like methods, but also generates a robust ensemble model with a low variance. As observed in Figure 2 although Adaboost.SAMME generates higher training accuracy than SGB-ELM during the most of 50 trials, SGB-ELM obtains the better generalization capability (testing accuracy). It can be explained by two reasons as
Figure 3 shows the training RMSE/accuracy and testing RMSE/accuracy of Adaboost (Adaboost.R2 for regression and Adaboost.SAMME for classification) and SGB-ELM with regard to the number of iterations. The fixed reference line denotes the training and testing performance of a traditional ELM, which is equipped with much more hidden nodes. As shown in Figure 3, SGB-ELM obviously improves the generalization capability of the initial base ELM in both regression and classification tasks. From Figure 3(a), we can find that the training and testing RMSE is declining gradually as the number of iterations increases. Similarly, both the training and testing accuracy curve show an increasing trend in Figure 3(b). Because we conduct multiple random initializations for parameters in the initial base learner
From the experimental results of both regression and classification problems, we can conclude that our proposed SGB-ELM algorithm can not only achieve better generalization capability (low predicted bias) than the typical existing variants of ELM, but also obtain an enough robust ELM ensemble learning model (low predicted variance).
4.2. Impact of Learning Parameters on Training SGB-ELM
To achieve good generalization performance, three learning parameters of SGB-ELM including the number of hidden nodes
For the basic ELM model, the number of hidden nodes
[figures omitted; refer to PDF]
As shown in Figure 4, changing the value of
From Figure 5, it is obvious that randomization improves the performance of SGB-ELM substantially. As each weak individual ELM is learned based on randomly selected subset of the whole training dataset, it exactly increases the diversity between all the individuals. On the other hand, randomization introduces a noisy estimate of the total training loss. As a result, it slows down the convergence and even makes the learning curve fluctuate (higher variance) if
5. Conclusions
In this paper, we proposed a novel ensemble model named Stochastic Gradient Boosting-based Extreme Learning Machine (SGB-ELM). Instead of combining ELM and stochastic gradient boosting primitively, we construct an ELM flow or ELM sequence where the output-layer weights of each weak ELM are determined by optimizing the regularized objective additively. Firstly, by minimizing the objective using second-order approximation, the derivation formula aimed at solving the output-layer weights of each individual ELM is determined. Then we take the output-layer weights learned by the current pseudo residuals as a heuristic item and thus obtain the optimal output-layer weights by updating the heuristic item iteratively. The performance of SGB-ELM was evaluated on 4 regression and 5 classification datasets. In comparison with several typical ELM ensemble methods, SGB-ELM obtained better performance and robustness, which demonstrated the feasibility and effectiveness of SGB-ELM algorithm.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this article.
Authors’ Contributions
Hua Guo and Jikui Wang contributed equally the same to this work.
Acknowledgments
This work is supported by National Natural Science Foundations of China (61503252 and 61473194), China Postdoctoral Science Foundation (2016T90799), and Natural Science Foundation of Gansu Province (17JR5RA177).
[1] G. B. Huang, Q. Y. Zhu, C. K. Siew, "Extreme learning machine: a new learning scheme of feedforward neural networks," Proceedings of the IEEE International Joint Conference on Neural Networks, vol. 2, pp. 985-990, DOI: 10.1109/IJCNN.2004.1380068, .
[2] G. B. Huang, Q. Y. Zhu, C. K. Siew, "Extreme learning machine: theory and applications," Neurocomputing, vol. 70 no. 1–3, pp. 489-501, DOI: 10.1016/j.neucom.2005.12.126, 2006.
[3] G.-B. Huang, H. Zhou, X. Ding, R. Zhang, "Extreme learning machine for regression and multiclass classification," IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 42 no. 2, pp. 513-529, DOI: 10.1109/TSMCB.2011.2168604, 2012.
[4] R. Penrose, "A generalized inverse for matrices," Mathematical Proceedings of the Cambridge Philosophical Society, vol. 51 no. 3, pp. 406-413, DOI: 10.1017/S0305004100030401, 1955.
[5] N. Liu, H. Wang, "Ensemble based extreme learning machine," IEEE Signal Processing Letters, vol. 17 no. 8, pp. 754-757, DOI: 10.1109/LSP.2010.2053356, 2010.
[6] J. Cao, Z. Lin, G.-B. Huang, N. Liu, "Voting based extreme learning machine," Information Sciences, vol. 185, pp. 66-77, DOI: 10.1016/j.ins.2011.09.015, 2012.
[7] X. Xue, M. Yao, Z. Wu, J. Yang, "Genetic ensemble of extreme learning machine," Neurocomputing, vol. 129, pp. 175-184, DOI: 10.1016/j.neucom.2013.09.042, 2014.
[8] A. O. M. Abuassba, D. Zhang, X. Luo, A. Shaheryar, H. Ali, "Improving Classification Performance through an Advanced Ensemble Based Heterogeneous Extreme Learning Machines," Computational Intelligence and Neuroscience, vol. 2017, 2017.
[9] M. Han, B. Liu, "Ensemble of extreme learning machine for remote sensing image classification," Neurocomputing, vol. 149, pp. 65-70, DOI: 10.1016/j.neucom.2013.09.070, 2015.
[10] H.-J. Lu, C.-L. An, E.-H. Zheng, Y. Lu, "Dissimilarity based ensemble of extreme learning machine for gene expression data classification," Neurocomputing, vol. 128, pp. 22-30, DOI: 10.1016/j.neucom.2013.02.052, 2014.
[11] B. Mirza, Z. Lin, N. Liu, "Ensemble of subset online sequential extreme learning machine for class imbalance and concept drift," Neurocomputing, vol. 149, pp. 316-329, DOI: 10.1016/j.neucom.2014.03.075, 2015.
[12] D. Wang, M. Alhamdoosh, "Evolutionary extreme learning machine ensembles with size control," Neurocomputing, vol. 102, pp. 98-110, DOI: 10.1016/j.neucom.2011.12.046, 2013.
[13] X.-Z. Wang, R. Wang, H.-M. Feng, H.-C. Wang, "A new approach to classifier fusion based on upper integral," IEEE Transactions on Cybernetics, vol. 44 no. 5, pp. 620-635, DOI: 10.1109/TCYB.2013.2263382, 2014.
[14] L. Breiman, "Bagging predictors," Machine Learning, vol. 24 no. 2, pp. 123-140, 1996.
[15] Y. Freund, R. Schapire, "A short introduction to boosting," Journal of Japanese Society For Artificial Intelligence, vol. 14, pp. 771-780, 1999.
[16] J. B. Hampshire, A. H. Waibel, "Novel objective function for improved phoneme recognition using time-delay neural networks," IEEE Transactions on Neural Networks and Learning Systems, vol. 1 no. 2, pp. 216-228, DOI: 10.1109/72.80233, 1990.
[17] Q. Xu, Y. Xiong, H. Dai, K. M. Kumari, Q. Xu, H.-Y. Ou, D.-Q. Wei, "PDC-SGB: Prediction of effective drug combinations using a stochastic gradient boosting algorithm," Journal of Theoretical Biology, vol. 417,DOI: 10.1016/j.jtbi.2017.01.019, 2017.
[18] . Xin Yao, . Yong Liu, "Making use of population information in evolutionary artificial neural networks," IEEE Transactions on Systems, Man and Cybernetics, Part B (Cybernetics), vol. 28 no. 3, pp. 417-425, DOI: 10.1109/3477.678637, .
[19] J. H. Friedman, "Greedy function approximation: a gradient boosting machine," The Annals of Statistics, vol. 29 no. 5, pp. 1189-1232, DOI: 10.1214/aos/1013203451, 2001.
[20] J. H. Friedman, "Stochastic gradient boosting," Computational Statistics & Data Analysis, vol. 38 no. 4, pp. 367-378, DOI: 10.1016/s0167-9473(01)00065-2, 2002.
[21] P. T. Boggs, J. W. Tolle, "Sequential Quadratic Programming," Acta Numerica, vol. 4,DOI: 10.1017/S0962492900002518, 1995.
[22] M. A. Figueiredo, J. M. Bioucas-Dias, R. D. Nowak, "Majorization-minimization algorithms for wavelet-based image restoration," IEEE Transactions on Image Processing, vol. 16 no. 12, pp. 2980-2991, DOI: 10.1109/TIP.2007.909318, 2007.
[23] R. Battiti, "First- and Second-Order Methods for Learning: Between Steepest Descent and Newton's Method," Neural Computation, vol. 4 no. 2, pp. 141-166, DOI: 10.1162/neco.1992.4.2.141, 1992.
[24] P. L. Bartlett, "The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network," Institute of Electrical and Electronics Engineers Transactions on Information Theory, vol. 44 no. 2, pp. 525-536, DOI: 10.1109/18.661502, 1998.
[25] J. Alcalá-Fdez, A. Fernández, J. Luengo, J. Derrac, S. García, L. Sánchez, F. Herrera, "KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework," Journal of Multiple-Valued Logic and Soft Computing, vol. 17 no. 2-3, pp. 255-287, 2011.
[26] M. Lichman, UCI Machine Learning Repository, 2013.
[27] H. Drucker, "Improving regressors using boosting techniques," Proceedings of the International Conference on Machine Learning, vol. 97, pp. 107-115, .
[28] D. E. Rumelhart, G. E. Hinton, R. J. Williams, "Learning representations by back-propagating errors," Nature, vol. 323 no. 6088, pp. 533-536, DOI: 10.1038/323533a0, 1986.
[29] A. Coates, A. Y. Ng, "The importance of encoding versus training with sparse coding and vector quantization," Proceedings of the 28th International Conference on Machine Learning (ICML '11), pp. 921-928, .
[30] J. Zhu, H. Zou, S. Rosset, T. Hastie, "Multi-class AdaBoost," Statistics and Its Interface, vol. 2 no. 3, pp. 349-360, DOI: 10.4310/SII.2009.v2.n3.a8, 2009.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Copyright © 2018 Hua Guo et al. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. https://creativecommons.org/licenses/by/4.0/
Abstract
A novel ensemble scheme for extreme learning machine (ELM), named Stochastic Gradient Boosting-based Extreme Learning Machine (SGB-ELM), is proposed in this paper. Instead of incorporating the stochastic gradient boosting method into ELM ensemble procedure primitively, SGB-ELM constructs a sequence of weak ELMs where each individual ELM is trained additively by optimizing the regularized objective. Specifically, we design an objective function based on the boosting mechanism where a regularization item is introduced simultaneously to alleviate overfitting. Then the derivation formula aimed at solving the output-layer weights of each weak ELM is determined using the second-order optimization. As the derivation formula is hard to be analytically calculated and the regularized objective tends to employ simple functions, we take the output-layer weights learned by the current pseudo residuals as an initial heuristic item and thus obtain the optimal output-layer weights by using the derivation formula to update the heuristic item iteratively. In comparison with several typical ELM ensemble methods, SGB-ELM achieves better generalization performance and predicted robustness, which demonstrates the feasibility and effectiveness of SGB-ELM.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer