Skip to main content
Log in

An extreme learning machine algorithm for semi-supervised classification of unbalanced data streams with concept drift

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Data streams are important sources of information nowadays, and with the popularization of mobile devices and sensor systems that collect all kinds of data, more and more information is generated at an ever increasing speed. This growth in data supply poses some problems for traditional machine learning algorithms. Tasks such as data classification, regression, or data clustering presents some limitations regarding very large datasets, data streams, or variations in data. The high cost of labeling instances for training classification algorithms makes it difficult to use fully supervised algorithms. Unbalanced datasets tend to cause algorithms to ignore one or more classes. Moreover, concept drifts in data streams require algorithms to be retrained from time to time. In order to tackle such problems mentioned, a semi-supervised and online algorithm based on Extreme Learning Machine (ELM) called SSOE-FP-ELM is proposed and detailed. Experimental results show that the proposed algorithm outperform others in the literature in accuracy, generalization ability and concept drift detection and recovery, showing suitable alternatives for data streams classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Algorithm 2
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data Availability

The COIL20 and COIL100 datasets were collected from the Center for Research on Intelligent Systems at the Department of Computer Science, Columbia University [35, 36]. The KDEF_Front dataset is a subset of the KDEF dataset [33], using only frontal face images grayscaled and reduced to 30x30 pixels. The DNA dataset is available at LIBSVM datasets repository [15, 47]. All other datasets (CrowdSourcedMapping, Gisette, Isolet, Musk, Spambase, StatlogImageSegmentation, StatlogLandSatellite, Waveform) are available at the UCI Machine Learning Repository [14].

References

  1. Agrahari S, Singh AK (2022) Concept drift detection in data stream mining: a literature review. J King Saud Univ Comput Inf Sci 34(10, Part B):9523–9540. https://doi.org/10.1016/j.jksuci.2021.11.006

    Article  Google Scholar 

  2. Akusok A, Björk KM, Miche Y et al (2015) High-performance extreme learning machines: a complete toolbox for big data applications. IEEE Access 3:1011–1025. https://doi.org/10.1109/ACCESS.2015.2450498

    Article  Google Scholar 

  3. Anderson R, Koh YS, Dobbie G et al (2019) Recurring concept meta-learning for evolving data streams. Expert Syst Appl 138:112832. https://doi.org/10.1016/j.eswa.2019.112832

    Article  Google Scholar 

  4. Barros RSM, Santos SGTC (2018) A large-scale comparison of concept drift detectors. Inf Sci 451–452:348–370. https://doi.org/10.1016/j.ins.2018.04.014

    Article  MathSciNet  Google Scholar 

  5. de Barros RSM, Hidalgo JIG, de Lima Cabral DR (2018) Wilcoxon rank sum test drift detector. Neurocomputing 275:1954–1963. https://doi.org/10.1016/j.neucom.2017.10.051

    Article  Google Scholar 

  6. Bartlett PL (1998) The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network. IEEE Trans Inf Theory 44(2):525–536. https://doi.org/10.1109/18.661502

    Article  MathSciNet  Google Scholar 

  7. Ben-Israel A, Greville TN (2003) Generalized inverses: theory and applications, vol 15. Springer Science & Business Media

  8. Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(Feb):281–305. https://dl.acm.org/doi/abs/10.5555/2188385.2188395

  9. Budiman A, Fanany MI, Basaruddin C (2016) Adaptive online sequential ELM for concept drift tackling. Computational Intelligence and Neuroscience 2016. https://doi.org/10.1155/2016/8091267

  10. Cormen TH, Leiserson CE, Rivest RL et al (2009) Introduction to algorithms. MIT press

  11. da Costa FG, Rios RA, de Mello RF (2016) Using dynamical systems tools to detect concept drift in data streams. Expert Syst Appl 60:39–50. https://doi.org/10.1016/j.eswa.2016.04.026

    Article  Google Scholar 

  12. Derrac J, García S, Molina D et al (2011) A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evol Comput 1(1):3–18. https://doi.org/10.1016/j.swevo.2011.02.002

    Article  Google Scholar 

  13. Du Prel JB, Röhrig B, Hommel G et al (2010) Choosing statistical tests: part 12 of a series on evaluation of scientific publications. Dtsch Arztebl Int 107(19):343. https://doi.org/10.3238/arztebl.2010.0343

    Article  Google Scholar 

  14. Dua D, Graff C (2019) UCI machine learning repository. http://archive.ics.uci.edu/ml. Accessed 30 Nov 2019

  15. Fan RE, Lin CJ (2019) LIBSVM Data: classification, regression, and multi-label. https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/. Accessed 30 Nov 2019

  16. Gama Ja, Žliobaitundefined I, Bifet A et al (2014) A survey on concept drift adaptation. ACM Comput Surv 46(4). https://doi.org/10.1145/2523813

  17. Gu X (2023) A dual-model semi-supervised self-organizing fuzzy inference system for data stream classification. Appl Soft Comput 136:110053. https://doi.org/10.1016/j.asoc.2023.110053

    Article  Google Scholar 

  18. Han F, Yao HF, Ling QH (2013) An improved evolutionary extreme learning machine based on particle swarm optimization. Neurocomputing 116:87–93. https://doi.org/10.1016/j.neucom.2011.12.062

    Article  Google Scholar 

  19. Hayashi Y, Sakata M, Gallant SI (1990) Multi-layer versus single-layer neural networks and an application to reading hand-stamped characters. In: International neural network conference, Springer, pp 781–784. https://doi.org/10.1007/978-94-009-0643-3_74

  20. He X, Cai D, Niyogi P (2005) Laplacian score for feature selection. In: Advances in neural information processing systems, pp 507–514

  21. Hsu CW, Chang CC, Lin CJ (2003) A practical guide to support vector classification (technical report). Department of Computer Science and Information Engineering, National Taiwan University, Taipei. www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf. Accessed 30 Nov 2019

  22. Huang G, Song S, Gupta JN et al (2014) Semi-supervised and unsupervised extreme learning machines. IEEE Trans Cybern 44(12):2405–2417. https://doi.org/10.1109/TCYB.2014.2307349

    Article  Google Scholar 

  23. Huang GB, Zhu QY, Siew CK (2004) Extreme learning machine: a new learning scheme of feedforward neural networks. In: 2004 IEEE International joint conference on neural networks, IEEE, pp 985–990. https://doi.org/10.1109/IJCNN.2004.1380068

  24. Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1):489–501. https://doi.org/10.1016/j.neucom.2005.12.126

    Article  Google Scholar 

  25. Huang GB, Zhou H, Ding X et al (2012) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern Part B Cybern 42(2):513–529. https://doi.org/10.1109/TSMCB.2011.2168604

    Article  Google Scholar 

  26. Jia X, Wang R, Liu J et al (2016) A semi-supervised online sequential extreme learning machine method. Neurocomputing 174:168–178. https://doi.org/10.1016/j.neucom.2015.04.102

    Article  Google Scholar 

  27. Lei Y, Chen X, Min M et al (2020) A semi-supervised laplacian extreme learning machine and feature fusion with cnn for industrial superheat identification. Neurocomputing 381:186–195. https://doi.org/10.1016/j.neucom.2019.11.012

    Article  Google Scholar 

  28. Li L, Sun R, Cai S et al (2019) A review of improved extreme learning machine methods for data stream classification. Multimed Tools Appl 78:33375–33400. https://doi.org/10.1007/s11042-019-7543-2

    Article  Google Scholar 

  29. Li Q, Xiong Q, Ji S et al (2021) Incremental semi-supervised extreme learning machine for mixed data stream classification. Expert Syst Appl 185:115591. https://doi.org/10.1016/j.eswa.2021.115591

    Article  Google Scholar 

  30. Li Y, Wang Y, Liu Q et al (2019) Incremental semi-supervised learning on streaming data. Pattern Recognit 88:383–396. https://doi.org/10.1016/j.patcog.2018.11.006

    Article  Google Scholar 

  31. Liang NY, Huang GB, Saratchandran P et al (2006) A fast and accurate online sequential learning algorithm for feedforward networks. IEEE Trans Neural Netw 17(6):1411–1423. https://doi.org/10.1109/TNN.2006.880583

    Article  Google Scholar 

  32. Liu D, Wu Y, Jiang H (2016) FP-ELM: an online sequential learning algorithm for dealing with concept drift. Neurocomputing 207:322–334. https://doi.org/10.1016/j.neucom.2016.04.043

    Article  Google Scholar 

  33. Lundqvist D, Flykt A, Öhman A (1998) The Karolinska directed emotional faces (KDEF). CD ROM from Department of Clinical Neuroscience, Psychology section, Karolinska Institutet

    Google Scholar 

  34. Ma L, Ma A, Ju C et al (2016) Graph-based semi-supervised learning for spectral-spatial hyperspectral image classification. Pattern Recogn Lett 83:133–142. https://doi.org/10.1016/j.patrec.2016.01.022

    Article  Google Scholar 

  35. Nene SA, Nayar SK, Murase H (1996a) Columbia object image library (COIL-100)

  36. Nene SA, Nayar SK, Murase H (1996b) Columbia object image library (COIL-20)

  37. Pao YH, Takefuji Y (1992) Functional-link net computing: theory, system architecture, and functionalities. Computer 25(5):76–79. https://doi.org/10.1109/2.144401

    Article  Google Scholar 

  38. Pao YH, Park GH, Sobajic DJ (1994) Learning and generalization characteristics of the random vector functional-link net. Neurocomputing 6(2):163–180. https://doi.org/10.1016/0925-2312(94)90053-1

    Article  Google Scholar 

  39. Qiu S, Li P, Hu X (2022) Semi-supervised online kernel extreme learning machine for multi-label data stream classification. In: 2022 International joint conference on neural networks (IJCNN), pp 1–8. https://doi.org/10.1109/IJCNN55064.2022.9892701

  40. Rumelhart DE, Hinton GE, Williams RJ et al (1988) Learning representations by back-propagating errors. Cognit Model 5(3):1. https://doi.org/10.1038/323533a0

    Article  Google Scholar 

  41. Sawant SS, Prabukumar M (2018) A review on graph-based semi-supervised learning methods for hyperspectral image classification. Egypt J Remote Sens Space Sci. https://doi.org/10.1016/j.ejrs.2018.11.001

  42. Schmidt WF, Kraaijveld MA, Duin RP et al (1992) Feed forward neural networks with random weights. In: International conference on pattern recognition, IEEE COMPUTER SOCIETY PRESS, pp 1–1

  43. Sethi TS, Kantardzic M (2017) On the reliable detection of concept drift from streaming unlabeled data. Expert Syst Appl 82:77–99. https://doi.org/10.1016/j.eswa.2017.04.008

    Article  Google Scholar 

  44. da Silva CAS, Krohling RA (2018) Semi-supervised online elastic extreme learning machine for data classification. In: 2018 IEEE IJCNN - International Joint Conference on Neural Networks, IEEE, pp 1511–1518. https://doi.org/10.1109/IJCNN.2018.8489632

  45. da Silva CAS, Krohling RA (2019) Semi-supervised online elastic extreme learning machine with forgetting parameter to deal with concept drift in data streams. In: 2019 IEEE IJCNN - International joint conference on neural networks, IEEE, pp 1–8. https://doi.org/10.1109/IJCNN.2019.8852361

  46. Tsymbal A (2004) The problem of concept drift: definitions and related work. Comput Scie Dep Trinity College Dublin 106(2):58

    Google Scholar 

  47. Vanschoren J, van Rijn JN, Bischl B et al (2013) OpenML: networked science in machine learning. SIGKDD Explorations 15(2):49–60. https://doi.org/10.1145/2641190.2641198

    Article  Google Scholar 

  48. Wang H, Abraham Z (2015) Concept drift detection for streaming data. In: International joint conference on neural networks (IJCNN), IEEE, pp 1–9. https://doi.org/10.1109/IJCNN.2015.7280398

  49. Wang J, Lu S, Wang SH et al (2022) A review on extreme learning machine. Multimed Tools Appl 81(29):41611–41660. https://doi.org/10.1007/s11042-021-11007-7

    Article  Google Scholar 

  50. Xie J, Liu S, Dai H (2019) Manifold regularization based distributed semi-supervised learning algorithm using extreme learning machine over time-varying network. Neurocomputing 355:24–34. https://doi.org/10.1016/j.neucom.2019.03.079

    Article  Google Scholar 

  51. Xin J, Wang Z, Qu L et al (2015) Elastic extreme learning machine for big data classification. Neurocomputing 149:464–471. https://doi.org/10.1016/j.neucom.2013.09.075

    Article  Google Scholar 

  52. Xu S, Wang J (2017) Dynamic extreme learning machine for data stream classification. Neurocomputing 238:433–449. https://doi.org/10.1016/j.neucom.2016.12.078

    Article  Google Scholar 

  53. Yan J, Cao Y, Kang B et al (2021) An elm-based semi-supervised indoor localization technique with clustering analysis and feature extraction. IEEE Sensors J 21(3):3635–3644. https://doi.org/10.1109/JSEN.2020.3028579

    Article  Google Scholar 

  54. Yang Z, Cohen WW, Salakhutdinov R (2016) Revisiting semi-supervised learning with graph embeddings. In: Proceedings of The 33rd International Conference on Machine Learning, pp 40–48. http://proceedings.mlr.press/v48/yanga16.html

  55. Zhang Z, Cai Y, Gong W (2023) Semi-supervised learning with graph convolutional extreme learning machines. Expert Syst Appl 213:119164. https://doi.org/10.1016/j.eswa.2022.119164

    Article  Google Scholar 

  56. Zhao J, Wang Z, Park DS (2012) Online sequential extreme learning machine with forgetting mechanism. Neurocomputing 87:79–89. https://doi.org/10.1016/j.neucom.2012.02.003

    Article  Google Scholar 

  57. Zheng X, Li P, Wu X (2022) Data stream classification based on extreme learning machine: a review. Big Data Res 30:100356. https://doi.org/10.1016/j.bdr.2022.100356

    Article  Google Scholar 

Download references

Acknowledgements

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001. R.A. Krohling would like to thank the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) - grant n. 309729/2018-1 - and the Fundação de Amparo a Pesquisa e Inovação do Espírito Santo (FAPES) - grant n. 575/2018. The authors would like to thank the support of NVIDIA Corporation with the donation of the Titan X GPU used for this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carlos A. S. da Silva.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Standard ELM

Different from gradient descent-based learning methods, ELM does not perform an iterative adjustment of weights. The input weights of the network are randomly generated and maintained during the training phase. The network output weights are calculated analytically using the least squares method [24]. Thus, the training time of the ELM algorithm is a few hundred (or thousands) times less than the backpropagation algorithm, according to [24]. Furthermore, ELM is not susceptible to getting stuck in local minima or slow convergence, common in backpropagation.

Fig. 12
figure 12

Illustration of SLFN trained with ELM

Figure 12 shows a SLFN trained with ELM algorithm. Consider a training set \((x_{i}, t_{i}), x_{i} \in R^d, t_{i} \in R^m, i = 1, \ldots , N\), where N is the number of training samples, d is the number of features and m is the number of classes; an activation function g(x), a sigmoidal function for example; and a number of neurons n in the hidden layer. The relationship between the input data \(x_{i}\) and the network output values \(t_{i}\) is given by:

$$\begin{aligned} t_{i} = \sum _{j=1}^{n}\beta _{i}g(w_{j}x_{i} + b_{j}), \ i = 1,..., N \end{aligned}$$
(A1)

where \(\beta _{i}\) is the output weights, \(w_{j}\) and \(b_{j}\) are the input weights and bias. In ELM, \(w_{j}\) and \(b_{j}\) are set randomly before the training phase, and are not adjusted.

Equation (A1) can be written in the matrix form \(H\beta = T\), where H is the hidden layer matrix (feature matrix) given by:

$$\begin{aligned} H = \left[ \begin{array}{ccc} g(w_{1}x_{1} + b_{1}) &{} ... &{} g(w_{n}x_{1} + b_{n}) \\ \vdots &{} \ddots &{} \vdots \\ g(w_{1}x_{N} + b_{1}) &{} ... &{} g(w_{n}x_{N} + b_{n}) \end{array} \right] _{N \times n} \end{aligned}$$
(A2)

The linear system \(H\beta = T\) is solved by employing the Moore-Penrose generalized inverse (pseudo-inverse) of the H matrix, represented by \(H^{\dagger }\). There are different ways to calculate the pseudo-inverse of a matrix, including the orthogonal projection, iterative methods, and decomposition into singular values (SVD) [7]. The solution of ELM and \(H^{\dagger }\) matrix computation are given by:

$$\begin{aligned} H^{\dagger } = (H^{T}H)^{-1}H^{T} \end{aligned}$$
(A3)
$$\begin{aligned} \beta = H^{\dagger }T \longrightarrow \beta = (H^{T}H)^{-1}H^{T}T \end{aligned}$$
(A4)

The addition of a small value on the diagonal of a matrix guarantees this matrix is non-singular, enabling the orthogonal projection method to be used in the computation of the pseudo-inverse. As demonstrated by [25], this regularization factor makes the solution obtained more stable and with greater generalization ability. Thus, for a given regularization factor \(\alpha \), \(H^{\dagger }\) matrix and \(\beta \) matrix can be obtained as follows:

$$\begin{aligned} H^{\dagger } = (H^{T}H + \alpha I)^{-1}H^{T} \end{aligned}$$
(A5)
$$\begin{aligned} \beta = (H^{T}H + \alpha I)^{-1}H^{T}T \end{aligned}$$
(A6)

The ELM training pseudocode, as proposed by [24], is described in Algorithm 3.

Algorithm 3
figure c

ELM training.

Table 15 Standard ELM computational complexity, based on [2]
Table 16 SSOE-FP-ELM computational complexity for each training partition k, based on [2]

Appendix B: Time complexity of SSOE-FP-ELM

For many classification problems, the total number of training samples N and the number of neurons n are the highest values, generally greater than the number of features d or the number of classes c. Therefore, to compute the ELM time complexity, the order of importance of the parameters from the most relevant to the least relevant is: number of training samples and number of neurons; number of features; number of classes.

Considering the dimensions of the input matrices \(X_{[N \times d]}\), \(W_{[d \times n]}\) and \(T_{[N \times c]}\), and the greater impact of parameters N and n on computational time, Table 15 presents the computational complexity of each operation in Standard ELM, and the final complexity. It is important to note that in classification problems with a large number of training samples, the term \(n^{2}N\) gains greater importance in the calculation of the final time complexity [10]. In problems where the number of neurons n is greater than the number of training samples N, the term \(n^{3}\) stands out. Both operations are performed only once, during \(\beta \) computation.

In the SSOE-FP-ELM algorithm, which performs the training process in an online manner, the total number of training samples N is replaced by the size of each training partition \(N_{k}\) in the input matrices dimensions and in the calculations performed. Generally, the size of training partitions is smaller than the number of neurons, making parameter n the most important when calculating computational time of SSOE-FP-ELM.

Table 16 presents the computational complexity of each operation in SSOE-ELM, and the final complexity. In addition to the standard ELM operations, SSOE-FP-ELM computes the intermediate matrices L, J (for semi-supervised learning), U and V (for online update) for each training partition. SSOE-FP-ELM has a longer training time than the SSOE-ELM, due to the supervised factor of the forgetting parameter. To assess the accuracy of the model from one training partition to another and check whether a label concept drift has occurred, it is necessary to calculate the \(\beta \) output weights matrix, and this is the operation with the highest computational cost. Thus, at each training partition the SSOE-FP-ELM needs to carry out all operations (calculate the intermediate matrices L, J, U and V, compute the \(\beta \) matrix and calculate the forgetting parameter), different from the SSOE-ELM that does not need to calculate \(\beta \).

It is important to note that the SSOE-FP-ELM algorithm is based on SSOE-ELM, so the operations of the semi-supervised learning and online update steps are the same for both algorithms. The difference between the two algorithms, in terms of computational time, is the calculation of the semi-supervised forgetting parameter in the SSOE-FP-ELM.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

da Silva, C.A.S., Krohling, R.A. An extreme learning machine algorithm for semi-supervised classification of unbalanced data streams with concept drift. Multimed Tools Appl 83, 37549–37588 (2024). https://doi.org/10.1007/s11042-023-17039-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-17039-5

Keywords

Navigation