基于层次粒化的特征选择算法Feature Selection Algorithm Based on Hierarchical Granulation
陈辉皇,林耀进,王晨曦,童先群,胡敏杰
摘要(Abstract):
许多实际应用问题中,特征空间存在着层次粒化结构.首先,提出基于核方法度量的层次聚类来对特征空间进行层次粒化.其次,在层次粒化后的各个子空间上,基于邻域互信息考量特征和标记间最大相关以及特征与特征间最小冗余性,在某一指定的层次上对特征进行排序.在此基础上,选择各个子空间具有代表性的部分特征,组成最终的特征子集.最后,在6个UCI数据集和2个不同基分类器上的实验表明所提算法的有效性.
关键词(KeyWords): 特征选择;粒计算;层次粒化;互信息
基金项目(Foundation): 国家自然科学基金资助项目(61303131,61672272);; 福建省高校新世纪优秀人才、福建省教育厅科技项目(JA14192)
作者(Author): 陈辉皇,林耀进,王晨曦,童先群,胡敏杰
DOI: 10.13705/j.issn.1671-6841.2016096
参考文献(References):
- [1]LIANG J Y,WANG F,DANG C Y,et al.An efficient rough feature selection algorithm with a multi-granulation view[J].Int J Approx Reason,2012,53(6):912-926.
- [2]GUYON I,ELISSEEFF A.An introduction to variable and feature selection[J].J Mach Learn Res,2003,3(6):1157-1182.
- [3]李霞,蒋盛益,郭艾侠.基于聚类和信息熵的特征选择算法[J].郑州大学学报(理学版),2009,41(1):77-80.
- [4]何华平,陈光建.一种最小测试代价约简的改进算法[J].郑州大学学报(理学版),2015,47(1):74-77.
- [5]王杰,蔡良健,高瑜.一种基于决策树的多示例学习算法[J].郑州大学学报(理学版),2016,48(1):81-84.
- [6]TANG J,ALELYANI S,LIU H.Data classification:algorithms and applications[M].Florida:Chemical Rubber Company Press,2014.
- [7]LI Y,GAO S Y,CHEN S.Ensemble feature weighting based on local learning and diversity[C]//Proceedings of the 26th AAAI conference on artificial intelligence.Edmonton,2012.
- [8]LIANG J,WANG F,DANG C,et al.An efficient rough feature selection algorithm with a multi-granulation view[J].Int J Approx Reason,2012,53(6):912-926.
- [9]ZHU W,SI G,ZHANG Y,et al.Neighborhood effective information ratio for hybrid feature subset evaluation and selection[J].Neurocomputing,2013,99(1):25-37.
- [10]LIN Y J,LI J J,LIN P R,et al.Feature selection via neighborhood multi-granulation fusion[J].Knowl-based Syst,2014,67(1):162-168.
- [11]LIN Y J,HU X G,WU X D,Quality of information-based source assessment and selection[J].Neurocomputing,2014,133(1):95-102.
- [12]刘景华,林梦雷,王晨曦,等.基于最大近邻粗糙逼近的特征选择算法[J].小型微型计算机系统,2015,36(8):1832-1836.
- [13]HU Q H,CHE X,ZHANG L,et al.Feature evaluation and selection based on neighborhood soft margin[J].Neurocomputing,2010,73(10):2114-2124.
- [14]彭鹏,闫晓琳.血常规检验中的常见误差观察研究[J].中国卫生标准管理,2015,(15):172-174.
- [15]胡清华,于达仁,谢宗霞.基于邻域粒化和粗糙逼近的数值属性约简[J].软件学报,2008,19(3):640-649.
- [16]HU Q H,ZHANG L,ZHANG D,et al.Measuring relevance between discrete and continuous features based on neighborhood mutual information[J].Expert Syst Appl,2011,38(9):10737-10750.
- [17]YU L,LIU H.Efficient feature selection via analysis of relevance and redundancy[J].J Mach Learn Res,2004,5(10):1205-1224.
- [18]ZHOU N,WANG L.A modified T-test feature selection method and its application on the Hap Map enotype data[J].Genomics proteomics bioinformatics,2007,5(Z1):242-249.