An extreme learning machine algorithm for semi-supervised classification of unbalanced data streams with concept drift

da Silva, Carlos A. S.; Krohling, Renato A.

doi:10.1007/s11042-023-17039-5

An extreme learning machine algorithm for semi-supervised classification of unbalanced data streams with concept drift

Published: 02 October 2023

Volume 83, pages 37549–37588, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

118 Accesses
Explore all metrics

Abstract

Data streams are important sources of information nowadays, and with the popularization of mobile devices and sensor systems that collect all kinds of data, more and more information is generated at an ever increasing speed. This growth in data supply poses some problems for traditional machine learning algorithms. Tasks such as data classification, regression, or data clustering presents some limitations regarding very large datasets, data streams, or variations in data. The high cost of labeling instances for training classification algorithms makes it difficult to use fully supervised algorithms. Unbalanced datasets tend to cause algorithms to ignore one or more classes. Moreover, concept drifts in data streams require algorithms to be retrained from time to time. In order to tackle such problems mentioned, a semi-supervised and online algorithm based on Extreme Learning Machine (ELM) called SSOE-FP-ELM is proposed and detailed. Experimental results show that the proposed algorithm outperform others in the literature in accuracy, generalization ability and concept drift detection and recovery, showing suitable alternatives for data streams classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on ensemble learning

Article 30 August 2019

Introduction to Machine Learning

A survey on imbalanced learning: latest research, applications and future directions

Article Open access 09 May 2024

Data Availability

The COIL20 and COIL100 datasets were collected from the Center for Research on Intelligent Systems at the Department of Computer Science, Columbia University [35, 36]. The KDEF_Front dataset is a subset of the KDEF dataset [33], using only frontal face images grayscaled and reduced to 30x30 pixels. The DNA dataset is available at LIBSVM datasets repository [15, 47]. All other datasets (CrowdSourcedMapping, Gisette, Isolet, Musk, Spambase, StatlogImageSegmentation, StatlogLandSatellite, Waveform) are available at the UCI Machine Learning Repository [14].

References

Agrahari S, Singh AK (2022) Concept drift detection in data stream mining: a literature review. J King Saud Univ Comput Inf Sci 34(10, Part B):9523–9540. https://doi.org/10.1016/j.jksuci.2021.11.006
Article Google Scholar
Akusok A, Björk KM, Miche Y et al (2015) High-performance extreme learning machines: a complete toolbox for big data applications. IEEE Access 3:1011–1025. https://doi.org/10.1109/ACCESS.2015.2450498
Article Google Scholar
Anderson R, Koh YS, Dobbie G et al (2019) Recurring concept meta-learning for evolving data streams. Expert Syst Appl 138:112832. https://doi.org/10.1016/j.eswa.2019.112832
Article Google Scholar
Barros RSM, Santos SGTC (2018) A large-scale comparison of concept drift detectors. Inf Sci 451–452:348–370. https://doi.org/10.1016/j.ins.2018.04.014
Article MathSciNet Google Scholar
de Barros RSM, Hidalgo JIG, de Lima Cabral DR (2018) Wilcoxon rank sum test drift detector. Neurocomputing 275:1954–1963. https://doi.org/10.1016/j.neucom.2017.10.051
Article Google Scholar
Bartlett PL (1998) The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network. IEEE Trans Inf Theory 44(2):525–536. https://doi.org/10.1109/18.661502
Article MathSciNet Google Scholar
Ben-Israel A, Greville TN (2003) Generalized inverses: theory and applications, vol 15. Springer Science & Business Media
Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(Feb):281–305. https://dl.acm.org/doi/abs/10.5555/2188385.2188395
Budiman A, Fanany MI, Basaruddin C (2016) Adaptive online sequential ELM for concept drift tackling. Computational Intelligence and Neuroscience 2016. https://doi.org/10.1155/2016/8091267
Cormen TH, Leiserson CE, Rivest RL et al (2009) Introduction to algorithms. MIT press
da Costa FG, Rios RA, de Mello RF (2016) Using dynamical systems tools to detect concept drift in data streams. Expert Syst Appl 60:39–50. https://doi.org/10.1016/j.eswa.2016.04.026
Article Google Scholar
Derrac J, García S, Molina D et al (2011) A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evol Comput 1(1):3–18. https://doi.org/10.1016/j.swevo.2011.02.002
Article Google Scholar
Du Prel JB, Röhrig B, Hommel G et al (2010) Choosing statistical tests: part 12 of a series on evaluation of scientific publications. Dtsch Arztebl Int 107(19):343. https://doi.org/10.3238/arztebl.2010.0343
Article Google Scholar
Dua D, Graff C (2019) UCI machine learning repository. http://archive.ics.uci.edu/ml. Accessed 30 Nov 2019
Fan RE, Lin CJ (2019) LIBSVM Data: classification, regression, and multi-label. https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/. Accessed 30 Nov 2019
Gama Ja, Žliobaitundefined I, Bifet A et al (2014) A survey on concept drift adaptation. ACM Comput Surv 46(4). https://doi.org/10.1145/2523813
Gu X (2023) A dual-model semi-supervised self-organizing fuzzy inference system for data stream classification. Appl Soft Comput 136:110053. https://doi.org/10.1016/j.asoc.2023.110053
Article Google Scholar
Han F, Yao HF, Ling QH (2013) An improved evolutionary extreme learning machine based on particle swarm optimization. Neurocomputing 116:87–93. https://doi.org/10.1016/j.neucom.2011.12.062
Article Google Scholar
Hayashi Y, Sakata M, Gallant SI (1990) Multi-layer versus single-layer neural networks and an application to reading hand-stamped characters. In: International neural network conference, Springer, pp 781–784. https://doi.org/10.1007/978-94-009-0643-3_74
He X, Cai D, Niyogi P (2005) Laplacian score for feature selection. In: Advances in neural information processing systems, pp 507–514
Hsu CW, Chang CC, Lin CJ (2003) A practical guide to support vector classification (technical report). Department of Computer Science and Information Engineering, National Taiwan University, Taipei. www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf. Accessed 30 Nov 2019
Huang G, Song S, Gupta JN et al (2014) Semi-supervised and unsupervised extreme learning machines. IEEE Trans Cybern 44(12):2405–2417. https://doi.org/10.1109/TCYB.2014.2307349
Article Google Scholar
Huang GB, Zhu QY, Siew CK (2004) Extreme learning machine: a new learning scheme of feedforward neural networks. In: 2004 IEEE International joint conference on neural networks, IEEE, pp 985–990. https://doi.org/10.1109/IJCNN.2004.1380068
Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1):489–501. https://doi.org/10.1016/j.neucom.2005.12.126
Article Google Scholar
Huang GB, Zhou H, Ding X et al (2012) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern Part B Cybern 42(2):513–529. https://doi.org/10.1109/TSMCB.2011.2168604
Article Google Scholar
Jia X, Wang R, Liu J et al (2016) A semi-supervised online sequential extreme learning machine method. Neurocomputing 174:168–178. https://doi.org/10.1016/j.neucom.2015.04.102
Article Google Scholar
Lei Y, Chen X, Min M et al (2020) A semi-supervised laplacian extreme learning machine and feature fusion with cnn for industrial superheat identification. Neurocomputing 381:186–195. https://doi.org/10.1016/j.neucom.2019.11.012
Article Google Scholar
Li L, Sun R, Cai S et al (2019) A review of improved extreme learning machine methods for data stream classification. Multimed Tools Appl 78:33375–33400. https://doi.org/10.1007/s11042-019-7543-2
Article Google Scholar
Li Q, Xiong Q, Ji S et al (2021) Incremental semi-supervised extreme learning machine for mixed data stream classification. Expert Syst Appl 185:115591. https://doi.org/10.1016/j.eswa.2021.115591
Article Google Scholar
Li Y, Wang Y, Liu Q et al (2019) Incremental semi-supervised learning on streaming data. Pattern Recognit 88:383–396. https://doi.org/10.1016/j.patcog.2018.11.006
Article Google Scholar
Liang NY, Huang GB, Saratchandran P et al (2006) A fast and accurate online sequential learning algorithm for feedforward networks. IEEE Trans Neural Netw 17(6):1411–1423. https://doi.org/10.1109/TNN.2006.880583
Article Google Scholar
Liu D, Wu Y, Jiang H (2016) FP-ELM: an online sequential learning algorithm for dealing with concept drift. Neurocomputing 207:322–334. https://doi.org/10.1016/j.neucom.2016.04.043
Article Google Scholar
Lundqvist D, Flykt A, Öhman A (1998) The Karolinska directed emotional faces (KDEF). CD ROM from Department of Clinical Neuroscience, Psychology section, Karolinska Institutet
Google Scholar
Ma L, Ma A, Ju C et al (2016) Graph-based semi-supervised learning for spectral-spatial hyperspectral image classification. Pattern Recogn Lett 83:133–142. https://doi.org/10.1016/j.patrec.2016.01.022
Article Google Scholar
Nene SA, Nayar SK, Murase H (1996a) Columbia object image library (COIL-100)
Nene SA, Nayar SK, Murase H (1996b) Columbia object image library (COIL-20)
Pao YH, Takefuji Y (1992) Functional-link net computing: theory, system architecture, and functionalities. Computer 25(5):76–79. https://doi.org/10.1109/2.144401
Article Google Scholar
Pao YH, Park GH, Sobajic DJ (1994) Learning and generalization characteristics of the random vector functional-link net. Neurocomputing 6(2):163–180. https://doi.org/10.1016/0925-2312(94)90053-1
Article Google Scholar
Qiu S, Li P, Hu X (2022) Semi-supervised online kernel extreme learning machine for multi-label data stream classification. In: 2022 International joint conference on neural networks (IJCNN), pp 1–8. https://doi.org/10.1109/IJCNN55064.2022.9892701
Rumelhart DE, Hinton GE, Williams RJ et al (1988) Learning representations by back-propagating errors. Cognit Model 5(3):1. https://doi.org/10.1038/323533a0
Article Google Scholar
Sawant SS, Prabukumar M (2018) A review on graph-based semi-supervised learning methods for hyperspectral image classification. Egypt J Remote Sens Space Sci. https://doi.org/10.1016/j.ejrs.2018.11.001
Schmidt WF, Kraaijveld MA, Duin RP et al (1992) Feed forward neural networks with random weights. In: International conference on pattern recognition, IEEE COMPUTER SOCIETY PRESS, pp 1–1
Sethi TS, Kantardzic M (2017) On the reliable detection of concept drift from streaming unlabeled data. Expert Syst Appl 82:77–99. https://doi.org/10.1016/j.eswa.2017.04.008
Article Google Scholar
da Silva CAS, Krohling RA (2018) Semi-supervised online elastic extreme learning machine for data classification. In: 2018 IEEE IJCNN - International Joint Conference on Neural Networks, IEEE, pp 1511–1518. https://doi.org/10.1109/IJCNN.2018.8489632
da Silva CAS, Krohling RA (2019) Semi-supervised online elastic extreme learning machine with forgetting parameter to deal with concept drift in data streams. In: 2019 IEEE IJCNN - International joint conference on neural networks, IEEE, pp 1–8. https://doi.org/10.1109/IJCNN.2019.8852361
Tsymbal A (2004) The problem of concept drift: definitions and related work. Comput Scie Dep Trinity College Dublin 106(2):58
Google Scholar
Vanschoren J, van Rijn JN, Bischl B et al (2013) OpenML: networked science in machine learning. SIGKDD Explorations 15(2):49–60. https://doi.org/10.1145/2641190.2641198
Article Google Scholar
Wang H, Abraham Z (2015) Concept drift detection for streaming data. In: International joint conference on neural networks (IJCNN), IEEE, pp 1–9. https://doi.org/10.1109/IJCNN.2015.7280398
Wang J, Lu S, Wang SH et al (2022) A review on extreme learning machine. Multimed Tools Appl 81(29):41611–41660. https://doi.org/10.1007/s11042-021-11007-7
Article Google Scholar
Xie J, Liu S, Dai H (2019) Manifold regularization based distributed semi-supervised learning algorithm using extreme learning machine over time-varying network. Neurocomputing 355:24–34. https://doi.org/10.1016/j.neucom.2019.03.079
Article Google Scholar
Xin J, Wang Z, Qu L et al (2015) Elastic extreme learning machine for big data classification. Neurocomputing 149:464–471. https://doi.org/10.1016/j.neucom.2013.09.075
Article Google Scholar
Xu S, Wang J (2017) Dynamic extreme learning machine for data stream classification. Neurocomputing 238:433–449. https://doi.org/10.1016/j.neucom.2016.12.078
Article Google Scholar
Yan J, Cao Y, Kang B et al (2021) An elm-based semi-supervised indoor localization technique with clustering analysis and feature extraction. IEEE Sensors J 21(3):3635–3644. https://doi.org/10.1109/JSEN.2020.3028579
Article Google Scholar
Yang Z, Cohen WW, Salakhutdinov R (2016) Revisiting semi-supervised learning with graph embeddings. In: Proceedings of The 33rd International Conference on Machine Learning, pp 40–48. http://proceedings.mlr.press/v48/yanga16.html
Zhang Z, Cai Y, Gong W (2023) Semi-supervised learning with graph convolutional extreme learning machines. Expert Syst Appl 213:119164. https://doi.org/10.1016/j.eswa.2022.119164
Article Google Scholar
Zhao J, Wang Z, Park DS (2012) Online sequential extreme learning machine with forgetting mechanism. Neurocomputing 87:79–89. https://doi.org/10.1016/j.neucom.2012.02.003
Article Google Scholar
Zheng X, Li P, Wu X (2022) Data stream classification based on extreme learning machine: a review. Big Data Res 30:100356. https://doi.org/10.1016/j.bdr.2022.100356
Article Google Scholar

Download references

Acknowledgements

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001. R.A. Krohling would like to thank the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) - grant n. 309729/2018-1 - and the Fundação de Amparo a Pesquisa e Inovação do Espírito Santo (FAPES) - grant n. 575/2018. The authors would like to thank the support of NVIDIA Corporation with the donation of the Titan X GPU used for this research.

Author information

Authors and Affiliations

Informatics Department, IFES - Federal Institute of Espirito Santo, Alegre, ES, CEP 29500-000, Brazil
Carlos A. S. da Silva
Labcin - Production Engineering Department, UFES - Federal University of Espirito Santo, Vitoria, ES, CEP 29075-910, Brazil
Renato A. Krohling
Graduate Program in Computer Science, UFES - Federal University of Espirito Santo, Vitoria, ES, CEP 29075-910, Brazil
Renato A. Krohling

Authors

Carlos A. S. da Silva
View author publications
You can also search for this author in PubMed Google Scholar
Renato A. Krohling
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Carlos A. S. da Silva.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Standard ELM

Different from gradient descent-based learning methods, ELM does not perform an iterative adjustment of weights. The input weights of the network are randomly generated and maintained during the training phase. The network output weights are calculated analytically using the least squares method [24]. Thus, the training time of the ELM algorithm is a few hundred (or thousands) times less than the backpropagation algorithm, according to [24]. Furthermore, ELM is not susceptible to getting stuck in local minima or slow convergence, common in backpropagation.

Figure 12 shows a SLFN trained with ELM algorithm. Consider a training set $(x_{i}, t_{i}), x_{i} \in R^d, t_{i} \in R^m, i = 1, \ldots , N$, where N is the number of training samples, d is the number of features and m is the number of classes; an activation function g(x), a sigmoidal function for example; and a number of neurons n in the hidden layer. The relationship between the input data $x_{i}$ and the network output values $t_{i}$ is given by:

$$\begin{aligned} t_{i} = \sum _{j=1}^{n}\beta _{i}g(w_{j}x_{i} + b_{j}), \ i = 1,..., N \end{aligned}$$

(A1)

where $\beta _{i}$ is the output weights, $w_{j}$ and $b_{j}$ are the input weights and bias. In ELM, $w_{j}$ and $b_{j}$ are set randomly before the training phase, and are not adjusted.

Equation (A1) can be written in the matrix form $H\beta = T$, where H is the hidden layer matrix (feature matrix) given by:

$$\begin{aligned} H = \left[ \begin{array}{ccc} g(w_{1}x_{1} + b_{1}) &{} ... &{} g(w_{n}x_{1} + b_{n}) \\ \vdots &{} \ddots &{} \vdots \\ g(w_{1}x_{N} + b_{1}) &{} ... &{} g(w_{n}x_{N} + b_{n}) \end{array} \right] _{N \times n} \end{aligned}$$

(A2)

The linear system $H\beta = T$ is solved by employing the Moore-Penrose generalized inverse (pseudo-inverse) of the H matrix, represented by $H^{\dagger }$. There are different ways to calculate the pseudo-inverse of a matrix, including the orthogonal projection, iterative methods, and decomposition into singular values (SVD) [7]. The solution of ELM and $H^{\dagger }$ matrix computation are given by:

$$\begin{aligned} H^{\dagger } = (H^{T}H)^{-1}H^{T} \end{aligned}$$

(A3)

$$\begin{aligned} \beta = H^{\dagger }T \longrightarrow \beta = (H^{T}H)^{-1}H^{T}T \end{aligned}$$

(A4)

The addition of a small value on the diagonal of a matrix guarantees this matrix is non-singular, enabling the orthogonal projection method to be used in the computation of the pseudo-inverse. As demonstrated by [25], this regularization factor makes the solution obtained more stable and with greater generalization ability. Thus, for a given regularization factor $\alpha $, $H^{\dagger }$ matrix and $\beta $ matrix can be obtained as follows:

$$\begin{aligned} H^{\dagger } = (H^{T}H + \alpha I)^{-1}H^{T} \end{aligned}$$

(A5)

$$\begin{aligned} \beta = (H^{T}H + \alpha I)^{-1}H^{T}T \end{aligned}$$

(A6)

The ELM training pseudocode, as proposed by [24], is described in Algorithm 3.

Table 15 Standard ELM computational complexity, based on [2]

Full size table

Table 16 SSOE-FP-ELM computational complexity for each training partition k, based on [2]

Full size table

Appendix B: Time complexity of SSOE-FP-ELM

For many classification problems, the total number of training samples N and the number of neurons n are the highest values, generally greater than the number of features d or the number of classes c. Therefore, to compute the ELM time complexity, the order of importance of the parameters from the most relevant to the least relevant is: number of training samples and number of neurons; number of features; number of classes.

Considering the dimensions of the input matrices $X_{[N \times d]}$, $W_{[d \times n]}$ and $T_{[N \times c]}$, and the greater impact of parameters N and n on computational time, Table 15 presents the computational complexity of each operation in Standard ELM, and the final complexity. It is important to note that in classification problems with a large number of training samples, the term $n^{2}N$ gains greater importance in the calculation of the final time complexity [10]. In problems where the number of neurons n is greater than the number of training samples N, the term $n^{3}$ stands out. Both operations are performed only once, during $\beta $ computation.

In the SSOE-FP-ELM algorithm, which performs the training process in an online manner, the total number of training samples N is replaced by the size of each training partition $N_{k}$ in the input matrices dimensions and in the calculations performed. Generally, the size of training partitions is smaller than the number of neurons, making parameter n the most important when calculating computational time of SSOE-FP-ELM.

Table 16 presents the computational complexity of each operation in SSOE-ELM, and the final complexity. In addition to the standard ELM operations, SSOE-FP-ELM computes the intermediate matrices L, J (for semi-supervised learning), U and V (for online update) for each training partition. SSOE-FP-ELM has a longer training time than the SSOE-ELM, due to the supervised factor of the forgetting parameter. To assess the accuracy of the model from one training partition to another and check whether a label concept drift has occurred, it is necessary to calculate the $\beta $ output weights matrix, and this is the operation with the highest computational cost. Thus, at each training partition the SSOE-FP-ELM needs to carry out all operations (calculate the intermediate matrices L, J, U and V, compute the $\beta $ matrix and calculate the forgetting parameter), different from the SSOE-ELM that does not need to calculate $\beta $.

It is important to note that the SSOE-FP-ELM algorithm is based on SSOE-ELM, so the operations of the semi-supervised learning and online update steps are the same for both algorithms. The difference between the two algorithms, in terms of computational time, is the calculation of the semi-supervised forgetting parameter in the SSOE-FP-ELM.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

da Silva, C.A.S., Krohling, R.A. An extreme learning machine algorithm for semi-supervised classification of unbalanced data streams with concept drift. Multimed Tools Appl 83, 37549–37588 (2024). https://doi.org/10.1007/s11042-023-17039-5

Download citation

Received: 31 January 2023
Revised: 04 August 2023
Accepted: 11 September 2023
Published: 02 October 2023
Issue Date: April 2024
DOI: https://doi.org/10.1007/s11042-023-17039-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An extreme learning machine algorithm for semi-supervised classification of unbalanced data streams with concept drift

Abstract

Access this article