273 | 0 | 66 |
下载次数 | 被引频次 | 阅读次数 |
为了解决高维数据集中冗余信息影响三支高斯混合模型聚类效果的问题,将粒球邻域粗糙集的理论融入三支高斯混合聚类模型中,提出一种基于粒球邻域粗糙集的三支高斯混合聚类模型。首先,使用k-means聚类生成满足纯度要求的粒球集,再在粒球生成正域不变约束下进行属性约简,提取关键属性。其次,使用三支高斯混合模型对约简后的数据进行聚类,将对象划分到类簇的核心域或边界域。在7个UCI公共数据集上的对比实验结果表明,所提模型不仅继承了三支高斯混合聚类模型优越的聚类性能,具有更高的准确率、轮廓系数和更低的戴维森堡丁指数,其对类簇边界部分的刻画也更加准确。此外,由于所提模型对高维空间进行了属性约简处理,使得其具有更小的时间复杂度。
Abstract:In order to solve the problem of redundant information in affecting the clustering effect of three-way Gaussian mixture models in high-dimensional datasets, the theory of granular ball neighborhood rough sets was integrated into the model, and a three-way Gaussian mixture clustering model based on granular ball neighborhood rough sets was proposed. Firstly, k-means clustering was used to generate a set of granular balls that meet the purity requirements, and attribute reduction was performed with the invariant constraint of the positive region produced by the granular balls to extract key attributes. Secondly, the three-way Gaussian mixture model was used to cluster the reduced data, dividing the objects into the core region or the boundary region of the clusters. Comparative experimental results on 7 UCI public datasets demonstrated that the proposed model not only inherited the superior clustering performance of the three-way Gaussian mixture model with higher accuracy, silhouette coefficient, and lower Davies-Bouldin index, but also provided a more accurate depiction of the cluster boundaries. Furthermore, as a result of reducing attributes in high-dimensional space, the proposed model achieved lower time complexity.
[1] HUANG H J,LIAO Z P,WEI X X,et al.Combined Gaussian mixture model and pathfinder algorithm for data clustering[J].Entropy,2023,25(6):946.
[2] WANG R R,HAN S Y,ZHOU J,et al.Transfer-learning-based Gaussian mixture model for distributed clustering[J].IEEE transactions on cybernetics,2023,53(11):7058-7070.
[3] ZHANG Y,LI M M,WANG S W,et al.Gaussian mixture model clustering with incomplete data[J].ACM transactions on multimedia computing,communications,and applications,2021,17(1):1-14.
[4] 程宏兵,王本安,陈友荣,等.基于高斯混合模型和自适应簇数的文本聚类[J].浙江工业大学学报,2023,51(6):602-609.CHENG H B,WANG B A,CHEN Y R,et al.Text clustering based on Gaussian mixture model and self-adaptive number of clusters[J].Journal of Zhejiang university of technology,2023,51(6):602-609.
[5] 陈佳琪,何玉林,黄哲学,等.基于统计感知策略的高斯混合模型求解方法[J].数据采集与处理,2023,38(3):525-538.CHEN J Q,HE Y L,HUANG Z X,et al.Solution method of Gaussian mixture model with statistical aware strategy[J].Journal of data acquisition and processing,2023,38(3):525-538.
[6] 万仁霞,王大庆,苗夺谦.基于三支决策的高斯混合聚类研究[J].重庆邮电大学学报(自然科学版),2021,33(5):806-815.WAN R X,WANG D Q,MIAO D Q.Gaussian mixture clustering based on three-way decision[J].Journal of Chongqing university of posts and telecommunications (natural science edition),2021,33(5):806-815.
[7] 徐晔,许晴媛,李进金.基于集覆盖理论的覆盖信息系统属性约简方法[J].郑州大学学报(理学版),2024,56(1):60-67.XU Y,XU Q Y,LI J J.Attribute reduction method for covering information system based on set covering theory[J].Journal of Zhengzhou university (natural science edition),2024,56(1):60-67.
[8] XU J C,YUAN M,MA Y Y.Feature selection using self-information and entropy-based uncertainty measure for fuzzy neighborhood rough set[J].Complex & intelligent systems,2022,8(1):287-305.
[9] HE J L,QU L D,WANG Z H,et al.Attribute reduction in an incomplete categorical decision information system based on fuzzy rough sets[J].Artificial intelligence review,2022,55(7):5313-5348.
[10] 季雨瑄,叶军,杨震宇,等.结合分辨矩阵改进的邻域粗糙集属性约简算法[J].山东大学学报(工学版),2022,52(4):99-109.JI Y X,YE J,YANG Z Y,et al.An improved neighborhood rough set attribute reduction algorithm combined with resolution matrix[J].Journal of Shandong university (engineering science),2022,52(4):99-109.
[11] PAWLAK Z.Rough sets[J].International journal of computer & information sciences,1982,11(5):341-356.
[12] DUBOIS D,PRADE H.Rough fuzzy sets and fuzzy rough sets[J].International journal of general systems,1990,17(2/3):191-209.
[13] HU Q H.Numerical attribute reduction based on neighborhood granulation and rough approximation[J].Journal of software,2008,19(3):640-649.
[14] JIA H J,DING S F,MA H,et al.Spectral clustering with neighborhood attribute reduction based on information entropy[J].Journal of computers,2014,9(6):1316-1324.
[15] XIA S Y,ZHANG H,LI W H,et al.GBNRS:a novel rough set algorithm for fast adaptive attribute reduction in classification[J].IEEE transactions on knowledge and data engineering,2022,34(3):1231-1242.
[16] 巴婧,陈妍,杨习贝.快速求解粒球粗糙集约简的属性划分方法[J].南京理工大学学报(自然科学版),2021,45(4):394-400.BA J,CHEN Y,YANG X B.Attribute partition strategy for quick searching reducts based on granular ball rough sets[J].Journal of Nanjing university of science and technology,2021,45(4):394-400.
[17] XIA S Y,LIU Y S,DING X,et al.Granular ball computing classifiers for efficient,scalable and robust learning[J].Information sciences,2019,483:136-152.
[18] YU H.A framework of three-way cluster analysis[C]//Proceedings of the International Joint Conference on Rough Sets.Cham:Springer International Publishing,2017:300-312.
[19] 康凯,胡军.基于三支聚类的协同过滤推荐方法[J].郑州大学学报(理学版),2022,54(3):22-27.KANG K,HU J.Collaborative filtering recommendation method based on three-way clustering[J].Journal of Zhengzhou university (natural science edition),2022,54(3):22-27.
[20] 方莲娣,张燕平,陈洁,等.基于三支决策的非重叠社团划分[J].智能系统学报,2017,12(3):293-300.FANG L D,ZHANG Y P,CHEN J,et al.Three-way decision based on non-overlapping community division[J].CAAI transactions on intelligent systems,2017,12(3):293-300.
[21] 何明,冯博琴,马兆丰,等.一种基于高斯混合模型的无监督粗糙聚类方法[J].哈尔滨工业大学学报,2006,38(2):256-259,322.HE M,FENG B Q,MA Z F,et al.An unsupervised rough clustering method based on Gaussian mixture model[J].Journal of Harbin institute of technology,2006,38(2):256-259,322.
[22] AHMED M,SERAJ R,ISLAM S M S.The k-means algorithm:a comprehensive survey and performance evaluation[J].Electronics,2020,9(8):1295.
[23] 罗舒文,万仁霞,苗夺谦.基于簇中心预选策略的三支决策密度峰值聚类算法[J].山西大学学报(自然科学版),2024,47(1):30-39.LUO S W,WAN R X,MIAO D Q.Three-way decision-based density peak clustering algorithm with clustering centers preselection[J].Journal of Shanxi university (natural science edition),2024,47(1):30-39.
[24] LAYTON R,WATTERS P,DAZELEY R.Evaluating authorship distance methods using the positive Silhouette coefficient[J].Natural language engineering,2013,19(4):517-535.
基本信息:
DOI:10.13705/j.issn.1671-6841.2024108
中图分类号:TP18
引用信息:
[1]邵春梅,万仁霞,苗夺谦等.基于粒球邻域粗糙集的三支高斯混合聚类[J].郑州大学学报(理学版),2025,57(06):16-23.DOI:10.13705/j.issn.1671-6841.2024108.
基金信息:
国家自然科学基金项目(62066001); 宁夏科技领军人才项目(2022GKLRLX08); 宁夏自然科学基金项目(2021AAC03203)