The effective BRKGA algorithm for the k -medoids clustering problem
RAIRO. Operations Research, Tome 56 (2022) no. 4, pp. 3137-3153

This paper presents a biased random-key genetic algorithm for k-medoids clustering problem. A novel heuristic operator was implemented and combined with a parallelized local search procedure. Experiments were carried out with fifty literature data sets with small, medium, and large sizes, considering several numbers of clusters, showed that the proposed algorithm outperformed eight other algorithms, for example, the classics PAM and CLARA algorithms. Furthermore, with the results of a linear integer programming formulation, we found that our algorithm obtained the global optimal solutions for most cases and, despite its stochastic nature, presented stability in terms of quality of the solutions obtained and the number of generations required to produce such solutions. In addition, considering the solutions (clusterings) produced by the algorithms, a relative validation index (average silhouette) was applied, where, again, was observed that our method performed well, producing cluster with a good structure.

DOI : 10.1051/ro/2022141
Classification : 90C59, 62H30
Keywords: $$-Medoids, Optimization, metaheuristics, Rio de Janeiro, Brazilmathematical programming
@article{RO_2022__56_4_3137_0,
     author = {Brito, Jose Andre and Semaan, Gustavo and Fadel, Augusto},
     title = {The effective {BRKGA} algorithm for the $k$-medoids clustering problem},
     journal = {RAIRO. Operations Research},
     pages = {3137--3153},
     year = {2022},
     publisher = {EDP-Sciences},
     volume = {56},
     number = {4},
     doi = {10.1051/ro/2022141},
     mrnumber = {4474769},
     zbl = {1500.90055},
     language = {en},
     url = {https://www.numdam.org/articles/10.1051/ro/2022141/}
}
TY  - JOUR
AU  - Brito, Jose Andre
AU  - Semaan, Gustavo
AU  - Fadel, Augusto
TI  - The effective BRKGA algorithm for the $k$-medoids clustering problem
JO  - RAIRO. Operations Research
PY  - 2022
SP  - 3137
EP  - 3153
VL  - 56
IS  - 4
PB  - EDP-Sciences
UR  - https://www.numdam.org/articles/10.1051/ro/2022141/
DO  - 10.1051/ro/2022141
LA  - en
ID  - RO_2022__56_4_3137_0
ER  - 
%0 Journal Article
%A Brito, Jose Andre
%A Semaan, Gustavo
%A Fadel, Augusto
%T The effective BRKGA algorithm for the $k$-medoids clustering problem
%J RAIRO. Operations Research
%D 2022
%P 3137-3153
%V 56
%N 4
%I EDP-Sciences
%U https://www.numdam.org/articles/10.1051/ro/2022141/
%R 10.1051/ro/2022141
%G en
%F RO_2022__56_4_3137_0
Brito, Jose Andre; Semaan, Gustavo; Fadel, Augusto. The effective BRKGA algorithm for the $k$-medoids clustering problem. RAIRO. Operations Research, Tome 56 (2022) no. 4, pp. 3137-3153. doi: 10.1051/ro/2022141

[1] D. Aloise and C. Contardo, A new global optimization algorithm for diameter minimization clusteringIn: Proceedings of Global Optimization Workshop (GOW16), Portugal, edited by D. Aloise. University of Minho/Algoritmi Research Centre. (2016) 171–174.

[2] J. A. M. Brito, T. M. Veiga and P. L. N. Silva, An optimization algorithm applied to the one-dimensional stratification problem. Surv. Methodol. 45 (2019) 295–315.

[3] R. Campello, E. R. Hruschka and V. Alves, On the efficiency of evolutionary fuzzy clustering. J. Heuristics 15 (2009) 43–75. | Zbl | DOI

[4] S. Chu, J. F. Roddick and J. Pan, Improved search strategies and extensions to k-medoids-based clustering algorithms. Int. J. Bus. Intell. Data Min. 3 (2008) 212–231.

[5] M. Daskin, Network and Discrete Location: Models, Algorithms, and Applications, 2nd edition. John Wiley & Sons (2013). | MR | Zbl

[6] R. Dinata, S. Retno and N. Hasdyna, Minimization of the number of iterations in k-medoids clustering with purity algorithm. Rev. Intell. Artif. 35 (2021) 193–199.

[7] A. C. Fadel, L. S. Ochi, J. A. M. Brito and G. S. Semaan, Microaggregation heuristic applied to statistical disclosure control. Inf. Sci. 548 (2021) 37–55. | MR | DOI

[8] E. Falkenauer, Genetic Algorithms and Grouping Problems. John Wiley & Sons (1998).

[9] P. Festa, A biased random-key genetic algorithm for data clustering. Math. Biosci. 245 (2013) 76–85. | MR | Zbl | DOI

[10] M. Garey and D. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman and Company (1979). | MR | Zbl

[11] J. Gonçalves and M. Resende, Biased random-key genetic algorithms for combinatorial optimization. J. Heuristics 17 (2011) 487–525. | DOI

[12] D. Gunopulos, Clustering overview and applications. In: Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu. Springer, Boston, MA (2009). | DOI

[13] J. Hair, W. Black, B. Babin and R. Anderson, Multivariate Data Analysis, 8th edition. Cengage Learning (2018).

[14] S. Hakimi, Optimum location of switching centers and the absolute centers and medians of a graph. Oper. Res. 12 (1964) 450–459. | Zbl | DOI

[15] J. Han and R. Ng, Clarans: a method for clustering objects for spatial datamining. IEEE Trans. Knowl. Data Eng. 14 (2002) 1003–1016. | DOI

[16] J. Han, J. Pei and H. Tong, Data Mining: Concepts and Techniques, 4th edition. Morgan Kaufmann Publishers 2022).

[17] P. Hansen and B. Jaumard, Cluster analysis and mathematical programming. Math. Program. 79 (1997) 191–215. | MR | Zbl | DOI

[18] A. Hudaib, M. Khanafseh and O. Surakhi, An improved version of k -medoid algorithm using CRO. Modern Appl. Sci. 12 (2018) 116–127. | DOI

[19] R. A. Johnson and W. D. Wichern, Applied Multivariate Statistical Analysis, 6th edition. Pearson (2018). | MR

[20] L. Kaufman and P. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis. Wiley-Interscience (1990). | MR | Zbl | DOI

[21] M. Kong, J. Pei, H. Cheng and P. Pardalos, A BRKGA-de algorithm for parallel-batching scheduling with deterioration and learning effects on parallel machines under preventive maintenance consideration. Ann. Math. Artif. Intell. 88 (2020) 237–267. | MR | DOI

[22] G. Laporte, S. Nickel and F. Da Gama, Location Science, 2nd edition. Springer (2019). | Zbl | MR

[23] C. Lucasius, A. Dane and G. Kateman, On k -medoid clustering of large data sets with the aid of a genetic algorithm: background, feasibility and comparison. Anal. Chim. Acta 282 (1993) 647–669. | DOI

[24] R. Mart, P. Pardalos and M. Resende, Handbook of Heuristics, 1st edition. Springer (2018). | Zbl

[25] N. Megiddo and K. Supowit, On the complexity of some common geometric location problems. SIAM J. Comput. 13 (1984) 182–196. | MR | Zbl | DOI

[26] M. Nascimento, F. Toledo and A. De Carvalho, Investigation of a new grasp- based clustering algorithm applied to biological data. Comput. Oper. Res. 37 (2010) 1381–1388. | Zbl | DOI

[27] M. Nascimento, F. Toledo and A. Carvalho, A hybrid heuristic for the k -medoids clustering problem. In: GECCO’12: Proceedings of the 14th Annual Conference on Genetic and Evolutionary Computation. Association for Computing Machinery, New York, NY, USA (2012) 417–424.

[28] M. Negreiros and A. Palhano, The capacitated centred clustering problem. Comput. Oper. Res. 33 (2006) 1639–1663. | Zbl | DOI

[29] M. Negreiros, N. Maculan, P. Batista, J. Rodrigues and A. Palhano, Capacitated clustering problems applied to the layout of it-teams in software factories. Ann. Oper. Res. (2020). DOI: . | DOI | MR | Zbl

[30] R. Oliveira, A. Chaves and L. Lorena, A comparison of two hybrid methods for constrained clustering problems. Appl. Soft Comput. 54 (2017) 256–266. | DOI

[31] M. K. Pakhira, S. Bandyopadhyay and U. Maulik, A study of some fuzzy cluster validity indices, genetic clustering and application to pixel classification. Fuzzy Sets Syst. 155 (2005) 191–214. | MR | DOI

[32] H. Park and C. Jun, A simple and fast algorithm for k -medoids clustering. Expert Syst. App. 36 (2009) 3336–3341. | DOI

[33] R. Punhani, V. P. S. Arora and A. Sai Sabitha, K -prototype algorithm for clustering large data sets with categorical values to established product segmentation. In: Proceedings of Data Analytics and Management, edited by D. Gupta, Z. Polkowski, A. Khanna, S. Bhattacharyya and O. Castillo. Springer, Singapore (2022) 343–353. | DOI

[34] M. Rao, Cluster analysis and mathematical programming. J. Am. Stat. Assoc. 66 (1971) 622–626. | Zbl | DOI

[35] E. Schubert and P. J. Rousseeuw, Faster k -medoids clustering: Improving the pam, clara, and clarans algorithms. In: Similarity Search and Applications, edited by G. Amato, C. Gennaro, V. Oria and M. Radovanović. Springer International Publishing, Cham (2019) 171–187. | DOI

[36] E. Schubert and P. J. Rousseeuw, Fast and eager k -medoids clustering: O ( k ) runtime improvement of the pam, clara, and clarans algorithms. Inf. Syst. 101 (2021) 101804. | DOI

[37] G. Semaan, Algoritmos para o Problema de Agrupamento Automático. Ph.D. thesis, Federal Fluminense University (2013).

[38] G. S. Semaan, J. A. De Moura Brito, I. Machado Coelho, E. Franco Silva, A. Cesar Fadel, L. Satoru Ochi and N. Maculan, A brief history of heuristics: from bounded rationality to intractability. IEEE Latin Am. Trans. 18 (2021) 1975–1986. | DOI

[39] W. Sheng and X. Liu, A hybrid algorithm for k -medoid clustering of large data sets. In: Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No.04TH8753) 1 (2004) 77–82. | DOI

[40] W. Sheng and X. Liu, A genetic k -medoids clustering algorithm. J. Heuristics 12 (2006) 447–466. | DOI

[41] H. Song, J. G. Lee and W. S. Han, PAMAE: parallel k -medoids clustering with high accuracy and efficiency. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, New York, NY, USA, (2017) 1087–1096.

[42] V. M. Spears and K. A. D. Jong, On the virtues of parameterized uniform crossover. In: Proceedings of the Fourth International Conference on Genetic Algorithms. (1991) 230–236.

[43] B. Tavakkol and Y. Son, Fuzzy kernel k -medoids clustering algorithm for uncertain data objects. Pattern Anal. App. 24 (2021) 1287–1302. | DOI

[44] L. Tseng and S. Yang, A genetic approach to the automatic clustering problem. Pattern Recogn. 34 (2001) 415–424. | Zbl | DOI

[45] A. Ushakov and I. Vasilyev, Near-optimal large-scale k -medoids clustering. Inf. Sci. 545 (2021) 344–362. | MR | Zbl | DOI

[46] A. V. Ushakov and I. Vasilyev, A parallel heuristic for a k -medoids clustering problem with unfixed number of clusters. In: 2019 42nd International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO) (2019) 1116–1120. | DOI

[47] M. J. Van Der Laan, K. S. Pollard and J. Bryan, A new partitioning around medoids algorithm. J. Stat. Comput. Simul. 78 (2003) 575–584. | MR | Zbl | DOI

[48] H. Vinod, Integer programming and theory of grouping. J. Am. Stat. Assoc. 64 (1969) 506–517. | Zbl | DOI

[49] X. Wang and X. Wang, A fast k -medoids clustering algorithm for image segmentation based object recognition. J. Rob. Autom. 4 (2021) 202–211.

[50] L. Wolsey, Integer Programming, 2nd edition. Wiley (2020). | MR

[51] D. Xu and Y. Tian, comprehensive survey of clustering algorithms. Ann. Data Sci. 2 (2015) 165–193. | DOI

[52] R. Xu and D. Wunsch, Survey clustering algorithms. IEEE Trans. Neural Netw. 16 (2005) 645–678. | DOI

[53] D. Yu, G. Liu, M. Guo and X. Liu, An improved k -medoids algorithm based on step increasing and optimizing. Expert Syst. App. 92 (2018) 464–473. | DOI

[54] S. Zadegan, M. Mirzaie and F. Sadoughi, Ranked k -medoids: a fast and accurate rank-based partitioning algorithm for clustering large datasets. Knowl.-Based Syst. 39 (2013) 133–143. | DOI

[55] Q. Zhang and I. Couloigner, A new and efficient k -medoid algorithm for spatial clustering. Lecture Notes Comput. Sci. 3482 (2005) 181–189. | DOI

[56] S. Zhu, L. Xu and E. Goodman, Evolutionary multi-objective automatic clustering enhanced with quality metrics and ensemble strategy. Knowl. Based Syst. 188 (2020) 105018. | DOI

Cité par Sources :