基于多来源文本的中文医学知识图谱的构建Construction of Chinese Medical Knowledge Graph Based on Multi-source Corpus
昝红英;窦华溢;贾玉祥;关同峰;奥德玛;张坤丽;穗志方;
摘要(Abstract):
中文医学知识图谱(Chinese medical knowledge graph, CMeKG)是对专业医学知识的结构化描述,构建中文医学知识图谱是各类智慧医疗应用的迫切需要。通过收集多来源医疗文本,详细分析语料的结构特征,结合医学知识的语义特点,制定了医学命名实体和实体关系的标注体系和规范;并开发了标注工具,在医学专家的指导下,选取106种高发疾病进行人工标注,命名实体一致率达到了87.3%,实体关系一致率达到了82.9%。在人工标注的基础上,进行实体及关系自动抽取,构建出的中文医学知识图谱CMeKG1.0版共包括6 310种疾病、19 853种药物(西药、中成药、中草药)、1 237种诊疗技术及设备,关联到的医学实体达20余万,概念关系实例及属性三元组达100余万。所构建的中文医学知识图谱为医疗问答系统和智能辅助诊疗等领域奠定了专业知识基础。
关键词(KeyWords): 医学知识图谱;命名实体;实体关系;标注规范;知识图谱构建
基金项目(Foundation): 国家社科基金重大资助项目(18ZDA315);; 河南省高等学校重点科研项目(20A520038);; 河南省科技攻关项目(192102210260);; 河南省科技攻关计划国际合作项目(172102410065)
作者(Authors): 昝红英;窦华溢;贾玉祥;关同峰;奥德玛;张坤丽;穗志方;
DOI: 10.13705/j.issn.1671-6841.2019383
参考文献(References):
- [1] WANG C Y,GAO M,HE X F,et al.Challenges in Chinese knowledge graph construction[C]//31st IEEE International Conference on Data Engineering Workshops.Seoul,2015.
- [2] ZHANG X L,DU C L,LI P S,et al.Knowledge graph completion via local semantic contexts[M].Database systems for advanced applications.Cham:Springer International Publishing,2016.
- [3] 刘则渊,陈悦,候海燕.科学知识图谱方法与应用[M].北京:人民出版社,2007.LIU Z Y,CHEN Y,HOU H Y.Mapping of scientific knowledge:methods and applications [M].Beijing:People′s Publishing House,2007.
- [4] 牟冬梅,张艳侠,黄丽丽,等.基于SNOMED CT和FCA的医学领域本体构建研究[J].情报学报,2013(6):653-662.MU D M,ZHANG Y X,HUANG L L,et al.Constructing medical ontology based on SNOMED CT and FCA[J].Journal of the China society for scientific and technical information,2013(6):653-662.
- [5] AMARILLI A,GALáRRAGA L,PREDA N,et al.Recent topics of research around the YAGO knowledge base[M].Cham:Springer International Publishing,2014.
- [6] AUER S,BIZER C,KOBILAROV G,et al.DBpedia:a nucleus for a web of open data[M].Berlin:Springer Berlin Heidelberg,2007:722-735.
- [7] CEUSTERS W,MARTENS P,DHAEN C,et al.LinkFactory:an advanced formal ontology management system[J].Proceedings of interactive tools for knowledge capture (KCAP 2001).Victoria B C,2001:75-204.
- [8] STEVENS R,BAKER P,BECHHOFER S,et al.TAMBIS:transparent access to multiple bioinformatics information sources[J].Bioinformatics,2000,16(2):184-186.
- [9] NADKARNI P,CHEN R,BRANDT C.UMLS concept indexing for production databases:a feasibility study[J].Journal of the American medical informatics association,2001,8(1):80-91.
- [10] 阮彤,孙程琳,王昊奋,等.中医药知识图谱构建与应用[J].医学信息学杂志,2016,37(4):8-13.RUAN T,SUN C L,WANG H F,et al.Construction of traditional Chinese medicine knowledge graph and its application[J].Journal of medical informatics,2016,37(4):8-13.
- [11] 贾李蓉,刘静,于彤,等.中医药知识图谱构建[J].医学信息学杂志,2015,36(8):51-53,59.JIA L R,LIU J,YU T,et al.Construction of traditional Chinese medicine knowledge graph[J].Journal of medical informatics,2015,36(8):51-53,59.
- [12] 侯丽,钱庆,黄利辉,等.基于本体的临床医学知识库系统构建探讨[J].医学信息学杂志,2011,32(4):42-47.HOU L,QIAN Q,HUANG L H,et al.Discussion on clinical medicine knowledge base system construction based on ontology[J].Journal of medical informatics,2011,32(4):42-47.
- [13] 徐琳宏,林鸿飞,赵晶.情感语料库的构建和分析[J].中文信息学报,2008,22(1):116-122.XU L H,LIN H F,ZHAO J.Construction and analysis of emotional corpus[J].Journal of Chinese information processing,2008,22(1):116-122.
- [14] 张德政,谢永红,李曼,等.基于本体的中医知识图谱构建[J].情报工程,2017,3(1):35-42.ZHANG D Z,XIE Y H,LI M,et al.Construction of knowledge graph of traditional Chinese medicine based on the ontology[J].Technology intelligence engineering,2017,3(1):35-42.
- [15] SUNDARARAJAN V,HENDERSON T,PERRY C,et al.New ICD-10 version of the Charlson comorbidity index predicted in-hospital mortality[J].Journal of clinical epidemiology,2004,57(12):1288-1294.
- [16] LIPSCOMB C E.Medical subject headings (MeSH)[J].Bulletin of the medical library association,2000,88(3):265-270.
- [17] 杨锦锋,于秋滨,关毅,等.电子病历命名实体识别和实体关系抽取研究综述[J].自动化学报,2014,40(8):1537-1562.YANG J F,YU Q B,GUAN Y,et al.An overview of research on electronic medical record oriented named entity recognition and entity relation extraction[J].Acta automatica sinica,2014,40(8):1537-1562.
- [18] JEAN C.Assessing agreement on classification tasks:the kappa statistic[J].Computational linguistics,1996,22(2):249-254.
- [19] HRIPCSAK G.Agreement,the F-measure,and reliability in information retrieval[J].Journal of the American medical informatics association,2005,12(3):296-298.
- [20] OGREN P,SAVOVA G,CHUTE C.Constructing evaluation corpora for automated clinical named entity recognition[C]//Proceedings of the 12th World Congress on Health (Medical) Informatics.Marrakech,2008:28-30.
- [21] ARTSTEIN R,POESIO M.Inter-coder agreement for computational linguistics[J].Computational linguistics,2008,34(4):555-596.