| 240 | 2 | 213 |
| 下载次数 | 被引频次 | 阅读次数 |
组块分析是浅层句法分析的典型任务,但目前的研究大多集中于组块边界识别和组块句法功能研究,缺少对组块内部关系的分析。采用基于规则的多结构融合方法进行块内关系分析,即将规则集与有限状态自动机、树结构、网格结构以及搭配知识进行有效融合,分析结果以三元组的形式表示。以宾州中文树库中gold standard数据集的共计2 005句作为测试集进行了实验,最终F1值达到了85.82%。所提方法加深了句法分析的深度,推动了传统组块分析向完全句法分析的发展。
Abstract:Chunk parsing was a typical task of shallow syntactic analysis, but most of the current studies focused on the recognition of chunk boundary and chunk syntactic functions, and lacked the analysis of the internal relationship of chunks. A rule-based multi-structure fusion method was used to analyze the syntactic relationship between words in the chunk. The rule set was effectively fused with the FSA, tree, lattice and collocation knowledge, and the analysis results were expressed in the form of triples. The gold standard data were selected from the CTB 8.0 version as the test set, a total of 2 005 sentences, the F1 value reached 85.82%. This method deepened the depth of syntactic parsing, and promoted the development of chunk parsing to complete syntactic parsing.
[1] 赵军.汉语基本名词短语识别及结构分析[D].北京:清华大学,1998.ZHAO J.Recognition and structural analysis of Chinese basic noun phrases [D].Beijing:Tsinghua University,1998.
[2] 李文捷,周明.基于语料库的中文最长名词短语的自动提取[C]//全国计算语言学联合学术会议.北京:清华大学出版社,1995:119-124.LI W J,ZHOU M.Automatic extraction of Chinese longest noun phrase based on Corpus [C]//National Joint Conference on Computational Linguistics.Beijing:Tsinghua University Press,1995:119-124.
[3] 刘芳,赵铁军,于浩,等.基于统计的汉语组块分析[J].中文信息学报,2000,14(6):28-32,39.LIU F,ZHAO T J,YU H,et al.Statistics based Chinese chunk parsin[J].Journal of Chinese information processing,2000,14(6):28-32,39.
[4] 钱小飞.组块分析研究综述[J].现代语文,2018(6):166-170.QIAN X F.Research review on chunk parsing[J].Modern Chinese,2018(6):166-170.
[5] RAMSHAW L A,MARCUS M P.Text chunking using transformation-based learning[J].Text speech & language technology,1995,11:82-94.
[6] 孙宏林.从标注语料库中归纳语法规则:“V+N”序列实验分析[C]//全国计算机语言学联合学术会议.北京:清华大学出版社,1997:157-163.SUN H L.Induction of grammatical rules from annotated corpus:experimental analysis of "V+N" sequence [C]//National Joint Conference on Computational Linguistics.Beijing:Tsinghua University Press,1997:157-163.
[7] 张昱琪,周强.汉语基本短语的自动识别[J].中文信息学报,2002,16(6):1-8.ZHANG Y Q,ZHOU Q.Automatic identification of Chinese base phrases[J].Journal of Chinese information processing,2002,16(6):1-8.
[8] 奚晨海,孙茂松.基于神经元网络的汉语短语边界识别[J].中文信息学报,2002,16(2):20-26.XI C H,SUN M S.Automatic prediction of Chinese phrase boundary location with neural networks[J].Journal of Chinese information processing,2002,16(2):20-26.
[9] 李素建,刘群,白硕.统计和规则相结合的汉语组块分析[J].计算机研究与发展,2002,39(4):385-391.LI S J,LIU Q,BAI S.Chinese chunking parsing using rule-based and statistics-based methods[J].Journal of computer research and development,2002,39(4):385-391.
[10] 孙广路.基于统计学习的中文组块分析技术研究[D].哈尔滨:哈尔滨工业大学,2008.SUN G L.Research on Chinese chunking based on statistical learning method[D].Harbin:Harbin Institute of Technology,2008.
[11] 程川.基于柱搜索和神经网络的组块分析研究[D].南京:南京大学,2016.CHENG C.Research on beam search and neural network for chunking[D].Nanjing:Nanjing University,2016.
[12] PARK S B,ZHANG B T.Text chunking by combining hand-crafted rules and memory-based learning[C]//Proceedings of the 41st Annual Meeting on Association for Computational Linguistics.Sapporo:Association for Computational Linguistics,2003:497-504.
[13] 于鸿霞.统计与规则相结合的中英文组块分析[D].哈尔滨:哈尔滨工业大学,2006.YU H X.A method combining rule-based and statistics-based approaches for chunk in English and Chinese[D].Harbin:Harbin Institute of Technology,2006.
[14] 宇航,周强.汉语基本块的内部关系分析[J].清华大学学报(自然科学版),2009,49(10):136-140.YU H,ZHOU Q.Intra-chunk relationship analyse for Chinese base chunk recognition systems[J].Journal of tsinghua university (science and technology),2009,49(10):136-140.
[15] 苗菁菁.基于汉语篇章级结构树库的句法分析[D].北京:北京语言大学,2020.MIAO J J.Constituency Parsing Based on Chinese Text treebank [D].Beijing:Beijing Language and Culture University,2020.
[16] 卢露,矫红岩,李梦,等.基于篇章的汉语句法结构树库构建[J/OL].自动化学报.http://kns.cnki.net/kcms/detail/11.2109.79.20200521.1558.007.html.LU L,JIAO H Y,LI M,et al.A discourse-based chinese chunkbank [J/OL].Acta automatica sinica.http://kns.cnki.net/kcms/detail/11.2109.79.20200521.1558.007.html.
[17] 荀恩东,饶高琦,肖晓悦,等.大数据背景下BCC语料库的研制[J].语料库语言学,2016,3(1):93-109,118.XUN E D,RAO G Q,XIAO X Y,et al.Development of BCC corpus in the context of big data [J].Corpus linguistics,2016,3(1):93-109,118.
[18] 王贵荣,饶高琦,荀恩东.基于大规模语料库的现代汉语动宾搭配知识库构建[J].中文信息学报,2021,35(1):34-42,53.WANG G R,RAO G Q,XUN E D.Construction of verb-object knowledge base from BCC corpus[J].Journal of Chinese information processing,2021,35(1):34-42,53.
基本信息:
DOI:10.13705/j.issn.1671-6841.2021322
中图分类号:TP391.1
引用信息:
[1]王贵荣,荀恩东,饶高琦.基于规则的中文组块内部关系分析[J],2022,54(03):28-33.DOI:10.13705/j.issn.1671-6841.2021322.
基金信息:
国家自然科学基金项目(62076038)