nav emailalert searchbtn searchbox tablepage yinyongbenwen piczone journalimg qikanlogo popupnotification paper
2025 02 v.57 24-30
基于多尺度特征提取的层次多标签文本分类方法
基金项目(Foundation): 国家重点研发计划(2021YFF0704100); 国家自然科学基金项目(62136002,62233018); 重庆市自然科学基金项目(cstc2022 ycjh-bgzxm0004)
邮箱(Email): yuhong@cqupt.edu.cn;
DOI: 10.13705/j.issn.1671-6841.2023120
中文作者单位:

重庆邮电大学计算智能重庆市重点实验室;

摘要(Abstract):

针对现有的特征提取方法忽略文本局部和全局联系的问题,提出了基于多尺度特征提取的层次多标签文本分类方法。首先,设计了多尺度特征提取模块,对不同尺度特征进行捕捉,更好地表示文本语义。其次,将层次特征嵌入文本表示中,得到具有标签特征的文本语义表示。最后,在标签层次结构的指导下对输入文本构建正负样本,进行对比学习,提高分类效果。在WOS、RCV1-V2、NYT和AAPD数据集上进行对比实验,结果表明,所提模型在评价指标上表现出色,超过了其他主流模型。此外,针对层次分类提出层次Micro-F1和层次Macro-F1指标,并对模型效果进行了评价。

关键词(KeyWords): 层次多标签文本分类;多尺度特征提取;对比学习;层次Micro-F1;层次Macro-F1
参考文献 [1] BANERJEE S,AKKAYA C,PEREZ-SORROSAL F,et al.Hierarchical transfer learning for multi-label text classification[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.Stroudsburg:Association for Computational Linguistics,2019:6295-6300.
[2] CESA-BIANCHI N,GENTILE C,ZANIBONI L.Incremental algorithms for hierarchical classification[J].Journal of machine learning research,2006,7:31-54.
[3] ZHOU J E,MA C P,LONG D K,et al.Hierarchy-aware global model for hierarchical text classification[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.Stroudsburg:Association for Computational Linguistics,2020:1106-1117.
[4] DENG Z F,PENG H,HE D X,et al.HTCInfoMax:a global model for hierarchical text classification via information maximization[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.Stroudsburg:Association for Computational Linguistics,2021:3259-3265.
[5] HUANG W,CHEN E H,LIU Q,et al.Hierarchical multi-label text classification:an attention-based recurrent network approach[C]//Proceedings of the 28th ACM International Conference on Information and Knowledge Management.New York:ACM Press,2019:1051-1060.
[6] ZHANG X Y,XU J H,SOH C,et al.LA-HCN:label-based attention for hierarchical multi-label text classification neural network[J].Expert systems with applications,2022,187:115922.
[7] WANG Z H,WANG P Y,HUANG L Z,et al.Incorporating hierarchy into text encoder:a contrastive learning approach for hierarchical text classification[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics.Stroudsburg:Association for Computational Linguistics,2022:7109-7119.
[8] 刘燕.基于BERT-BiGRU的中文专利文本自动分类[J].郑州大学学报(理学版),2023,55(2):33-40.LIU Y.Automatic classification method for Chinese patent texts based on BERT-BiGRU[J].Journal of Zhengzhou university (natural science edition),2023,55(2):33-40.
[9] 曾立英,许乾坤,张丽颖,等.面向主题检索的科技政策扩散识别方法[J].郑州大学学报(理学版),2022,54(5):82-89.ZENG L Y,XU Q K,ZHANG L Y,et al.Identification method for subject retrieval of science and technology policy diffusion[J].Journal of Zhengzhou university (natural science edition),2022,54(5):82-89.
[10] KOWSARI K,BROWN D E,HEIDARYSAFA M,et al.HDLTex:hierarchical deep learning for text classification[C]// Proceedings of the 16th IEEE International Conference on Machine Learning and Applications.Piscataway:IEEE Press,2018:364-371.
[11] BOWMAN S R,ANGELI G,POTTS C,et al.A large annotated corpus for learning natural language inference[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing.Stroudsburg:Association for Computational Linguistics,2015:632-642.
[12] LEWIS D,YANG Y M,ROSE T,et al.RCV1:a new benchmark collection for text categorization research[J].Journal of machine learning research,2004,5:361-397.
[13] YANG P,SUN X,LI W,et al.SGM:sequence generation model for multi-label classification[C]//Proceedings of the 27th International Conference on Computational Linguistics.Stroudsburg:Association for Computational Linguistics,2018:3915-3926.
[14] KIM Y.Convolutional neural networks for sentence classification[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing.Stroudsburg:Association for Computational Linguistics,2014:1746-1751.
[15] LIU X,WU J,YANG Y.Recurrent neural network for text classification with multi-task learning[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.Stroudsburg:Association for Computational Linguistics,2016:10-21.
[16] LAI S W,XU L H,LIU K,et al.Recurrent convolutional neural networks for text classification[C]// Proceedings of the AAAI Conference on Artificial Intelligence.Palo Alto:AAAI Press,2015:2268-2274.
[17] JOULIN A,GRAVE E,BOJANOWSKI P,et al.Bag of tricks for efficient text classification[C]//Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics.Stroudsburg:Association for Computational Linguistics,2017:427-431.
[18] NEUMANN M,VU N T.Attentive convolutional neural network based speech emotion recognition:a study on the impact of input features,signal length,and acted speech[C]// IEEE International Conference on Acoustics,Speech and Signal Processing.Piscataway:IEEE Press,2017:1972-1976.
[19] CONNEAU A,SCHWENK H,BARRAULT L,et al.Very deep convolutional networks for text classification[C]//Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics.Stroudsburg:Association for Computational Linguistics,2017:1107-1116.
[20] JIANG T,WANG D Q,SUN L L,et al.Exploiting global and local hierarchies for hierarchical text classification[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing.Stroudsburg:Association for Computational Linguistics,2022:4030-4039.
[21] CHEN H B,MA Q L,LIN Z X,et al.Hierarchy-aware label semantics matching network for hierarchical text classification[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing.Stroudsburg:Association for Computational Linguistics,2021:4370-4379.

基本信息:

DOI:10.13705/j.issn.1671-6841.2023120

中图分类号:TP391.1;TP18

引用信息:

[1]武子轩,王烨,于洪.基于多尺度特征提取的层次多标签文本分类方法[J].郑州大学学报(理学版),2025,57(02):24-30.DOI:10.13705/j.issn.1671-6841.2023120.

基金信息:

国家重点研发计划(2021YFF0704100); 国家自然科学基金项目(62136002,62233018); 重庆市自然科学基金项目(cstc2022 ycjh-bgzxm0004)

检 索 高级检索

引用

GB/T 7714-2015 格式引文
MLA格式引文
APA格式引文