基于改进TextRank的关键句提取方法The Method of Key Sentence Extraction Based on Improved TextRank
陈梦彤;谷晓燕;刘甜甜;
摘要(Abstract):
在进行文本挖掘时,通常根据关键词分析文本,这种方式容易忽略词语之间的关联性,影响文本挖掘的准确性。TextRank算法是提取关键词或者摘要的主要方法,该算法基于网络图考虑了句子间相似性,但是忽略了词语的特征。基于此,提出了一种改进TextRank算法,将相似语句合并后,考虑多种词特征进行关键句选取。首先,计算语句相似度,并且去除文中相似性较高的语句;然后,根据词频、词义、词位置对词语打分,构建有向图;最后,计算语句平均得分进行排序,选出关键句。实验结果表明,改进后的算法准确性优于其他算法,算法的时间复杂度降低,并且解决了关键词对文本描述片面和摘要烦琐的问题。
关键词(KeyWords): 关键句提取;改进TextRank算法;相似句合并;词特征
基金项目(Foundation): 国家自然科学基金项目(71701020);; 国家重点研发计划项目(2019YFB1405003);; 北京市社科项目(19YJB015)
作者(Authors): 陈梦彤;谷晓燕;刘甜甜;
DOI: 10.13705/j.issn.1671-6841.2021394
参考文献(References):
- [1] 陈可嘉,黄思翌.中文短文本自动关键词提取的改进RAKE算法[J].小型微型计算机系统,2021,42(6):1171-1175.CHEN K J,HUANG S Y.Improved RAKE algorithm for automatic keyword extraction in Chinese short text[J].Journal of Chinese computer systems,2021,42(6):1171-1175.
- [2] KHINE C,NONGPONG K.Harnessing frequency and language features for keyword extraction on E-commerce platforms[J].IOP conference series:materials science and engineering,2018,428:012021.
- [3] GOO Y H,SHIM K S,LEE M S,et al.A message keyword extraction approach by accurate identification of field boundaries[J].International journal of network management,2021,31(4):1099-1190.
- [4] KIM Y,LEE J H,CHOI S,et al.Validation of deep learning natural language processing algorithm for keyword extraction from pathology reports in electronic health records[J].Scientific reports,2020,10(1):20265-20274.
- [5] 张仰森,王胜,魏文杰,等.融合语义信息与问题关键信息的多阶段注意力答案选取模型[J].计算机学报,2021,44(3):491-507.ZHANG Y S,WANG S,WEI W J,et al.An answer selection model based on multi-stage attention mechanism with combination of semantic information and key information of the question[J].Chinese journal of computers,2021,44(3):491-507.
- [6] 谢勤政,谭庆平,颜颖,等.一种基于图和聚类的关键词自动提取方法[J].郑州大学学报(理学版),2018,50(2):81-85.XIE Q Z,TAN Q P,YAN Y,et al.An approach of automatic key phrase extraction based on graph and clustering[J].Journal of Zhengzhou university (natural science edition),2018,50(2):81-85.
- [7] 汪旭祥,韩斌,高瑞,等.基于改进TextRank的文本摘要自动提取[J].计算机应用与软件,2021,38(6):155-160.WANG X X,HAN B,GAO R,et al.Automatic extraction of text summarization based on improved Text-Rank[J].Computer applications and software,2021,38(6):155-160.
- [8] REN P J,CHEN Z M,REN Z C,et al.Sentence relations for extractive summarization with deep neural networks[J].ACM transactions on information systems,2018,36(4):1-32.
- [9] SONG S L,HUANG H T,RUAN T X.Abstractive text summarization using LSTM-CNN based deep learning[J].Multimedia tools and applications,2019,78(1):857-875.
- [10] ZHANG Y,LI D,WANG Y H,et al.Abstract text summarization with a convolutional Seq2seq model[J].Applied sciences,2019,9(8):1665.
- [11] 李航,唐超兰,杨贤,等.融合多特征的TextRank关键词抽取方法[J].情报杂志,2017,36(8):183-187.LI H,TANG C L,YANG X,et al.TextRank keyword extraction based on multi feature fusion[J].Journal of intelligence,2017,36(8):183-187.
- [12] XIONG A,LIU D R,TIAN H K,et al.News keyword extraction algorithm based on semantic clustering and word graph model[J].Tsinghua science and technology,2021,26(6):886-893.
- [13] 赵占芳,刘鹏鹏,李雪山.基于改进TextRank的铁路文献关键词抽取算法[J].北京交通大学学报,2021,45(2):80-86.ZHAO Z F,LIU P P,LI X S.Keywords extraction algorithm of railway literature based on improved TextRank[J].Journal of Beijing Jiaotong university,2021,45(2):80-86.
- [14] FAKHREZI M F,BIJAKSANA M A,HUDA A F.Implementation of automatic text summarization with Text-Rank method in the development of Al-qur′an vocabulary encyclopedia[J].Procedia computer science,2021,179:391-398.
- [15] 杨延娇,赵国涛,袁振强,等.融合语义特征的TextRank关键词抽取方法[J].计算机工程,2021,47(10):82-88.YANG Y J,ZHAO G T,YUAN Z Q,et al.TextRank-based keyword extraction method integrating semantic features[J].Computer engineering,2021,47(10):82-88.
- [16] BORDOLOI M,CHATTERJEE P C,BISWAS S K,et al.Keyword extraction using supervised cumulative Text-Rank[J].Multimedia tools and applications,2020,79(41/42):31467-31496.
- [17] LIU Z Y,LI P,ZHENG Y B,et al.Clustering to find exemplar terms for keyphrase extraction[C]//Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing.Singapor:ACL Press,2009:257-266.
- [18] 周宁,石雯茜,朱昭昭.基于粗糙数据推理的Text-Rank关键词提取算法[J].中文信息学报,2020,34(9):44-52.ZHOU N,SHI W Q,ZHU Z Z.TextRank keyword extraction algorithm based on rough data-deduction[J].Journal of Chinese information processing,2020,34(9):44-52.
- [19] WANG H C,HSIAO W C,CHANG S H.Automatic paper writing based on a RNN and the TextRank algorithm[J].Applied soft computing,2020,97:106767.
- [20] XIONG C Q,LI X,LI Y,et al.Multi-documents summarization based on TextRank and its application in online argumentation platform[J].International journal of data warehousing and mining,2018,14(3):69-89.