nav emailalert searchbtn searchbox tablepage yinyongbenwen piczone journalimg journalInfo searchdiv qikanlogo popupnotification paper paperNew
2025, 02, v.57 24-30
基于多尺度特征提取的层次多标签文本分类方法
基金项目(Foundation): 国家重点研发计划(2021YFF0704100); 国家自然科学基金项目(62136002,62233018); 重庆市自然科学基金项目(cstc2022 ycjh-bgzxm0004)
邮箱(Email): yuhong@cqupt.edu.cn;
DOI: 10.13705/j.issn.1671-6841.2023120
摘要:

针对现有的特征提取方法忽略文本局部和全局联系的问题,提出了基于多尺度特征提取的层次多标签文本分类方法。首先,设计了多尺度特征提取模块,对不同尺度特征进行捕捉,更好地表示文本语义。其次,将层次特征嵌入文本表示中,得到具有标签特征的文本语义表示。最后,在标签层次结构的指导下对输入文本构建正负样本,进行对比学习,提高分类效果。在WOS、RCV1-V2、NYT和AAPD数据集上进行对比实验,结果表明,所提模型在评价指标上表现出色,超过了其他主流模型。此外,针对层次分类提出层次Micro-F1和层次Macro-F1指标,并对模型效果进行了评价。

Abstract:

A hierarchical multi-label text classification method based on multi-scale feature extraction was proposed to address the issue of current feature extraction methods in neglecting the local and global connections in text. Firstly, a multi-scale feature extraction module was designed to capture features at different scales, aiming to provide a better representation of text semantics. Secondly, the hierarchical features were embedded into the text representation to obtain a text semantic representation with label features. Finally, with the guidance of the label hierarchy, positive and negative samples were constructed for the input text, and contrastive learning was performed to enhance the classification effectiveness. Comparative experiments were conducted on the WOS, RCV1-V2, NYT and AAPD datasets. The results indicated that the proposed model performed well in terms of the evaluation indices and exceeded other mainstream models. Additionally, the hierarchical Micro-F1 and Macro-F1 indicators were proposed for hierarchical classification, and the effectiveness of the model was evaluated.

参考文献

[1] BANERJEE S,AKKAYA C,PEREZ-SORROSAL F,et al.Hierarchical transfer learning for multi-label text classification[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.Stroudsburg:Association for Computational Linguistics,2019:6295-6300.

[2] CESA-BIANCHI N,GENTILE C,ZANIBONI L.Incremental algorithms for hierarchical classification[J].Journal of machine learning research,2006,7:31-54.

[3] ZHOU J E,MA C P,LONG D K,et al.Hierarchy-aware global model for hierarchical text classification[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.Stroudsburg:Association for Computational Linguistics,2020:1106-1117.

[4] DENG Z F,PENG H,HE D X,et al.HTCInfoMax:a global model for hierarchical text classification via information maximization[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.Stroudsburg:Association for Computational Linguistics,2021:3259-3265.

[5] HUANG W,CHEN E H,LIU Q,et al.Hierarchical multi-label text classification:an attention-based recurrent network approach[C]//Proceedings of the 28th ACM International Conference on Information and Knowledge Management.New York:ACM Press,2019:1051-1060.

[6] ZHANG X Y,XU J H,SOH C,et al.LA-HCN:label-based attention for hierarchical multi-label text classification neural network[J].Expert systems with applications,2022,187:115922.

[7] WANG Z H,WANG P Y,HUANG L Z,et al.Incorporating hierarchy into text encoder:a contrastive learning approach for hierarchical text classification[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics.Stroudsburg:Association for Computational Linguistics,2022:7109-7119.

[8] 刘燕.基于BERT-BiGRU的中文专利文本自动分类[J].郑州大学学报(理学版),2023,55(2):33-40.LIU Y.Automatic classification method for Chinese patent texts based on BERT-BiGRU[J].Journal of Zhengzhou university (natural science edition),2023,55(2):33-40.

[9] 曾立英,许乾坤,张丽颖,等.面向主题检索的科技政策扩散识别方法[J].郑州大学学报(理学版),2022,54(5):82-89.ZENG L Y,XU Q K,ZHANG L Y,et al.Identification method for subject retrieval of science and technology policy diffusion[J].Journal of Zhengzhou university (natural science edition),2022,54(5):82-89.

[10] KOWSARI K,BROWN D E,HEIDARYSAFA M,et al.HDLTex:hierarchical deep learning for text classification[C]// Proceedings of the 16th IEEE International Conference on Machine Learning and Applications.Piscataway:IEEE Press,2018:364-371.

[11] BOWMAN S R,ANGELI G,POTTS C,et al.A large annotated corpus for learning natural language inference[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing.Stroudsburg:Association for Computational Linguistics,2015:632-642.

[12] LEWIS D,YANG Y M,ROSE T,et al.RCV1:a new benchmark collection for text categorization research[J].Journal of machine learning research,2004,5:361-397.

[13] YANG P,SUN X,LI W,et al.SGM:sequence generation model for multi-label classification[C]//Proceedings of the 27th International Conference on Computational Linguistics.Stroudsburg:Association for Computational Linguistics,2018:3915-3926.

[14] KIM Y.Convolutional neural networks for sentence classification[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing.Stroudsburg:Association for Computational Linguistics,2014:1746-1751.

[15] LIU X,WU J,YANG Y.Recurrent neural network for text classification with multi-task learning[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.Stroudsburg:Association for Computational Linguistics,2016:10-21.

[16] LAI S W,XU L H,LIU K,et al.Recurrent convolutional neural networks for text classification[C]// Proceedings of the AAAI Conference on Artificial Intelligence.Palo Alto:AAAI Press,2015:2268-2274.

[17] JOULIN A,GRAVE E,BOJANOWSKI P,et al.Bag of tricks for efficient text classification[C]//Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics.Stroudsburg:Association for Computational Linguistics,2017:427-431.

[18] NEUMANN M,VU N T.Attentive convolutional neural network based speech emotion recognition:a study on the impact of input features,signal length,and acted speech[C]// IEEE International Conference on Acoustics,Speech and Signal Processing.Piscataway:IEEE Press,2017:1972-1976.

[19] CONNEAU A,SCHWENK H,BARRAULT L,et al.Very deep convolutional networks for text classification[C]//Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics.Stroudsburg:Association for Computational Linguistics,2017:1107-1116.

[20] JIANG T,WANG D Q,SUN L L,et al.Exploiting global and local hierarchies for hierarchical text classification[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing.Stroudsburg:Association for Computational Linguistics,2022:4030-4039.

[21] CHEN H B,MA Q L,LIN Z X,et al.Hierarchy-aware label semantics matching network for hierarchical text classification[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing.Stroudsburg:Association for Computational Linguistics,2021:4370-4379.

基本信息:

DOI:10.13705/j.issn.1671-6841.2023120

中图分类号:TP391.1;TP18

引用信息:

[1]武子轩,王烨,于洪.基于多尺度特征提取的层次多标签文本分类方法[J].郑州大学学报(理学版),2025,57(02):24-30.DOI:10.13705/j.issn.1671-6841.2023120.

基金信息:

国家重点研发计划(2021YFF0704100); 国家自然科学基金项目(62136002,62233018); 重庆市自然科学基金项目(cstc2022 ycjh-bgzxm0004)

检 索 高级检索