| 493 | 1 | 257 |
| 下载次数 | 被引频次 | 阅读次数 |
针对静态图像行为识别缺乏大规模训练数据集和无法利用时空特征所导致的识别效果不佳问题,提出一种结合残差神经网络(residual neural network, ResNet)和卷积注意力模块(convolutional block attention module, CBAM)的静态图像行为识别方法。使用特定数据增强技术对数据集进行扩充,采用迁移学习方法对模型初始化,并进行微调训练提升对静态图像行为识别的特征表达能力。通过将CBAM嵌入ResNet的第1个卷积层后来调整模型注意力,利用Grad-CAM方法提取模型识别图像时关注区域并进行可视化,对精度提升进行了解释。在PPMI数据集上,所提方法在演奏乐器类、持有乐器类和总类的平均识别精度分别达到88.30%、81.94%和77.93%,验证了方法的有效性。
Abstract:To address the problem of poor recognition performance caused by the lack of large-scale datasets and the inability to utilize spatiotemporal features, a model that combined residual neural network(ResNet)and convolutional block attention module(CBAM)was proposed for still image action recognition. Specific data augmentation techniques were employed to extend the dataset. Transfer learning was applied to initialize the model, followed by fine-tuning to enhance feature representation of still image action recognition. The CBAM was embedded into the first convolutional layer of ResNet to adjust the model′s attention. The Grad-CAM method was utilized to extract and visualize the regions of interest in image which provided an explanation for the precision improvement. On the PPMI dataset, the proposed model achieved the average precision for instrument-playing, instrument-holding, and overall categories of 88.30%, 81.94% and 77.93%, respectively, which verified the effectiveness of the method.
[1] GUO G D,LAI A.A survey on still image based human action recognition[J].Pattern recognition,2014,47(10):3343-3361.
[2] GIRISH D,SINGH V,RALESCU A.Understanding action recognition in still images[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops.Piscataway:IEEE Press,2020:1523-1529.
[3] YAO B P,LI F F.Recognizing human-object interactions in still images by modeling the mutual context of objects and human poses[J].IEEE transactions on pattern analysis and machine intelligence,2012,34(9):1691-1703.
[4] 杨红菊,冯进丽,郭倩.基于多核学习的静态图像人体行为识别方法[J].数据采集与处理,2016,31(5):958-964.YANG H J,FENG J L,GUO Q.Action recognition in still image based on multiple kernel learning[J].Journal of data acquisition and processing,2016,31(5):958-964.
[5] 王恩德,刘巧英,李勇.基于LLC与GIST特征的静态人体行为分类[J].计算机工程,2018,44(8):268-272,278.WANG E D,LIU Q Y,LI Y.Static human behavior classification based on LLC and GIST features[J].Computer engineering,2018,44(8):268-272,278.
[6] 钱文祥,衣杨.视频识别深度学习网络综述[J].计算机科学,2022,49(S2):341-350.QIAN W X,YI Y.Survey of deep learning networks for video recognition[J].Computer science,2022,49(S2):341-350.
[7] PRATT S,YATSKAR M,WEIHS L,et al.Grounded situation recognition[C]//European Conference on Computer Vision.Cham:Springer International Publishing,2020:314-332.
[8] LAVINIA Y,VO H,VERMA A.New colour fusion deep learning model for large-scale action recognition[J].International journal of computational vision and robotics,2020,10(1):41.
[9] LI Z Q,GE Y X,FENG J Y,et al.Deep selective feature learning for action recognition[C]//IEEE International Conference on Multimedia and Expo.Piscataway:IEEE Press,2020:1-6.
[10] 魏丽冉,岳峻,朱华,等.基于深度神经网络的人体动作识别方法[J].济南大学学报(自然科学版),2019,33(3):215-223,228.WEI L R,YUE J,ZHU H,et al.Human action recognition method based on deep neural network[J].Journal of university of Jinan (science and technology),2019,33(3):215-223,228.
[11] HE K M,ZHANG X Y,REN S Q,et al.Deep residual learning for image recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Press,2016:770-778.
[12] WOO S,PARK J,LEE J Y,et al.CBAM:convolutional block attention module[C]//European Conference on Computer Vision.Cham:Springer International Publishing,2018:3-19.
[13] YAO B P,LI F F.Grouplet:a structured image representation for recognizing human and object interactions[C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Press,2010:9-16.
[14] CHAKRABORTY S,MONDAL R,SINGH P K,et al.Transfer learning with fine tuning for human action recognition from still images[J].Multimedia tools and applications,2021,80(13):20547-20578.
[15] IANDOLA F N,HAN S,MOSKEWICZ M W,et al.SqueezeNet:AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size[EB/OL].(2021-04-15)[2023-03-06].https://arxiv.org/pdf/1602.07360.pdf.
[16] HOWARD A,SANDLER M,CHEN B,et al.Searching for MobileNetV3[C]//IEEE/CVF International Conference on Computer Vision.Piscataway:IEEE Press,2020:1314-1324.
[17] SELVARAJU R R,COGSWELL M,DAS A,et al.Grad-CAM:visual explanations from deep networks via gradient-based localization[J].International journal of computer vision,2020,128:336-359.
基本信息:
DOI:10.13705/j.issn.1671-6841.2023171
中图分类号:TP391.41
引用信息:
[1]高晗,万方杰,马明旭.结合ResNet和CBAM的静态图像行为识别方法[J].郑州大学学报(理学版),2025,57(03):65-71.DOI:10.13705/j.issn.1671-6841.2023171.
基金信息:
河南省重大专项(221100210100)
2023-07-06
2023
2025-06-17
2025
3
2024-06-24
2024-06-24
2024-06-24