| 205 | 3 | 186 |
| 下载次数 | 被引频次 | 阅读次数 |
为了对频繁更新的文档信息进行有效检索,提出了一种基于贝叶斯的N-Gram统计信息检索模型(Bayesian-based N-Gram,BNG).BNG模型无需对所有文档信息进行重新学习,只需根据新增的文档信息自适应地调整BNG模型的权值,以突出各个词语、文档对语义空间不同的贡献程度.实验结果表明,与现有的统计信息模型相比,提出的BNG模型显著地提高了检索的准确率与召回率.
Abstract:To efficiently retrieve frequently updating document information,a N-Gram statistical information retriveal model based on Bayesian theory is proposed(BNG).Without re-learning all the documents,BNG can adaptively adjust the weight parameters by the incremental documents to distinguish the contribution degrees of each term and document to semantic space.According to the current observation samples,the proposed BNG adjusts the parameters of baseline model which are obtained by maximum likelihood,and then BNG estimates hyperparameter parameters based on the parameters of baseline model.Experimental results show that,compared with the existing models,the proposed BNG model can greatly improve recall and precision rates of information retrieval systems.
[1]Zhai Chengxiang.Statistical language models for information retrieval a critical review[J].Foundations and Trends inInformation Retrieval,2008,2(3):137-213.
[2]Ki m K M,Hong J H,Cho S B.Asemantic Bayesian network approach to retrievinginformation withintelligent conver-sational agents[J].Information Processing and Management,2007,43(1):225-236.
[3]Wong S K M,Ziarko W,Raghavan V V,et al.Extended Boolean query processinginthe generalized vector space model[J].Information Systems,1989,14(1):47-63.
[4]邢军,韩敏.基于两层向量空间模型和模糊FCA本体学习方法[J].计算机研究与发展,2009,46(3):443-451.
[5]Landauer T K,Foltz P W,Laham D.Introductiontolatent semantic analysis[J].Discourse Processes,1998,27(25):259-284.
[6]Qu Shouning,Wang Qin,Zou Yan,et al.Intelligent question answering systembased on data mining[J].Journal of Zheng-zhou University:Naturnal Science Edition,2007,39(2):50-54.
[7]Si Luo,Jin Rong.Adjusting mixture weights of Gaussian mixture model via regularized probabilistic latent semantic anal-ysis[J].Lecture Notes in Computer Science,2005,3518:622-631.
[8]Akita Y,Kawahara T.Language model adaptation based on PLSA of topics and speakers for automatic transcription ofpanel discussions[J].IEICE Transactions on Information and Systems,2005,E88-D(3):439-444.
基本信息:
中图分类号:TP391.3
引用信息:
[1]任照富,常友渠,樊爱宛.基于贝叶斯的N-Gram统计信息检索模型[J],2010,42(01):21-23+37.