Graph Clustering Based on Modularity Variation Estimations
Graph clustering is one of the constantly actual data analysis problems. There are various statements of this problem. In this paper we consider the statement of search for a partition of a graph vertices set into disjoint subsets. In such a way, that the density of connections between the vertices of one subset was higher than that between the vertices of different subsets.
There is a lot of various approaches, and many of them use such as an a posteriori estimate of clustering quality, as modularity. The modularity functional, taking the value from [-1, 1], allows us to formally evaluate the quality of partitioning into subsets. This paper deals with an approach, instead of calculating the modularity, using a less computationally complex estimate of modularity change in the operation of clusters union.
Four theorems for different graph types are formulated, presenting the possibility of application of suggested estimate, instead of direct modularity calculations. A greedy al- gorithmic scheme and also AMVE (Algorithm based on Modularity Variation Estimation) — simple greedy algorithm based on the scheme are proposed.
An attempt of comparative analysis on the test problemet of AMVE with heuristic clustering algorithms implemented in actual data analysis software is described. It is shown the comparative advantage of AMVE, both in terms of speed and quality of clustering.
Also, the cases on the use of developed algorithm and its implementation for data analysis in sociology and history and criticism of literature are mentioned. In these cases, investigated graphs based on social networks data (the ratio of ”friendship” in the social network between users used as the graph edges). It is demonstrated a slight superiority of AMVE in clustering quality compared to the known algorithms Louvain and Walktrap.
Nikita N. Martynov, Undergraduate, Buryat State University, 24a, Smolin st., Ulan-Ude, Republic of Buryati, 670000, Russian Federation, tel.: (+7914)6368355, e-mail: firstname.lastname@example.org
Olga V. Khandarova, Cand. Sci. (Philology), Junior Research Scientist, Institute for Mongolian, Buddhist and Tibetan Studies SB RAS, 6, Sakhyanova st., Ulan-Ude, Republic of Buryatia, 670000, Russian Federation, tel.: (+7924)7731409, e-mail: email@example.com
Fedor V. Khandarov, Cand. Sci. (Technics), Senior Lecturer, Buryat State University, 24a, Smolin st., Ulan-Ude, Republic of Buryatia, 670000, Russian Federation, tel.: (+7924)4563112, e-mail: firstname.lastname@example.org
1. Badmatsyrenov T.B., Skvortsov M.V., Khandarov F.V. Buddhist digital practices of transcendence: VK-community “Hambo Lama Dashi-Dorzho Itigilov”. Monitoring of Public Opinion: Economic and Social Changes, 2018, no. 2, pp. 309-332. (in Russian) https://doi.org/10.14515/monitoring.2018.2.18.
2. Blondel V. D. et al. Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment, 2008, vol. 2008, no. 10, pp. 10008. https://doi.org/10.1088/1742-5468/2008/10/P10008
3. Csardi G., Nepusz T. The igraph software package for complex network research. InterJournal, Complex Systems, 2006, vol. 1695, no. 5, pp. 1-9.
4. Flake G. W., Lawrence S., Giles C. L. Efficient identification of web communities. Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2000, pp. 150-160. https://doi.org/10.1145/347090.347121
5. Fortunato S. Community detection in graphs. Physics reports, 2010, vol. 486. N. 3-5, pp. 75-174. https://doi.org/10.1016/j.physrep.2009.11.002
6. Khandarova O.V., Khandarov F.V., Martynov N.N., Skvortsov M.V. Writers of Buryatia on Facebook: literary reputations in the virtual space Mediafilosofiya, 2017, vol. 13, pp. 212-226. (in Russian)
7. Le Martelot E., Hankin C. Fast multi-scale detection of relevant communities in large-scale networks. The Computer Journal, 2013, vol. 56, no. 9, pp. 1136-1150. https://doi.org/10.1093/comjnl/bxt002
8. Leskovec J., Lang K. J., Dasgupta A., Mahoney M. W. Community structure in large networks: Natural cluster sizes and the absence of large welldefined clusters. Internet Mathematics, 2009, vol. 6, no. 1, pp. 29-123. https://doi.org/10.1080/15427951.2009.10129177
9. Leskovec J., Lang K. J., Mahoney M. Empirical comparison of algorithms for network community detection. Proceedings of the 19th international conference on World wide web. ACM, 2010, pp. 631-640. https://doi.org/10.1145/1772690.1772755
10. Maulik U., Bandyopadhyay S. Genetic algorithm-based clustering technique. Pattern recognition, 2000, vol. 33, no. 9, pp. 1455-1465. https://doi.org/10.1016/S0031-3203(99)00137-5
11. Newman M. E. J. Detecting community structure in networks. The European Physical Journal B, 2004, vol. 38, no. 2, pp. 321-330. https://doi.org/10.1140/epjb/e2004-00124-y
12. Pizzuti C. A multiobjective genetic algorithm to find communities in complex networks. IEEE Transactions on Evolutionary Computation, 2012, vol. 16, no. 3, pp. 418-430. https://doi.org/10.1109/TEVC.2011.2161090
13. Pons P., Latapy M. Computing communities in large networks using random walks. Journal of Graph Algorithms and Applications, 2006, vol. 10, no. 2, pp. 191–218. https://doi.org/10.1007/11569596 31
14. Radicchi F., Castellano C., Cecconi F., Loreto V., Parisi D. Defining and identifying communities in networks. Proceedings of the National Academy of Sciences, 2004, vol. 101, no. 9, pp. 2658-2663. https://doi.org/10.1073/pnas.0400054101
15. Raghavan U. N., Albert R., Kumara S. Near linear time algorithm to detect community structures in large-scale networks. Physical review E, 2007, vol. 76, no. 3, pp. 036106. https://doi.org/10.1103/PhysRevE.76.036106
16. Reichardt J., Bornholdt S. Statistical mechanics of community detection. Physical Review E, 2006, vol. 74, no. 1, pp. 016110. https://doi.org/10.1103/PhysRevE.74.016110