| 1,421 | 10 | 261 |
| 下载次数 | 被引频次 | 阅读次数 |
人体姿态估计是近年来计算机视觉问题中的一个热门话题,它在改善人类生活方面具有巨大的益处和潜在的应用。近年来深度神经网络得到快速发展,相较于传统方法而言,采用深度学习的方法更能提取图像表征信息。综合分析近年来人体姿态估计的进展,根据检测人数分为单人和多人人体姿态估计。针对单人姿态估计,介绍了基于直接预测人体坐标点的坐标回归方法及基于预测人体关键点高斯分布的热图检测方法;针对多人姿态估计,采用解决多人到解决单人过程的自顶向下方法和直接处理多人关键点的自底向上方法。总结了各方法网络结构的特点和优缺点,并阐述当前面临的问题及未来发展趋势。
Abstract:Human pose estimation was a hot topic in computer vision. It was of great benefit and potential in improving human life. In recent years, deep neural network has developed rapidly. Compared to traditional methods, deep learning could be used to improve extraction information from the image representation. The studies of human posture estimation were comprehensively analyzed in recent years, which could be divided into single-person and multi-person human pose estimation according to the number of people tested. For single-person pose estimation, a coordinate regression method based on direct prediction of human coordinate, and a heat map detection method based on prediction of Gaussian distribution of human key points were introduced. For multi-person pose estimation, a top-down approach from solving multi-person to solving single-person process and a bottom-up approach directly dealing with multi-person key points were adopted. Finally, the characteristics, advantages and disadvantages of each method network structure were summarized, and the current problems and future development trend were expounded.
[1] FISCHLER M A,ELSCHLAGER R A.The representation and matching of pictorial structures[J].IEEE transactions on computers,1973,C22(1):67-92.
[2] YANG Y,RAMANAN D.Articulated pose estimation with flexible mixtures-of-parts[C]//Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Press,2011:1385-1392.
[3] LECUN Y,BOSER B,DENKER J S,et al.Backpropagation applied to handwritten zip code recognition[J].Neural computation,1989,1(4):541-551.
[4] GOODFELLOW I J,POUGET-ABADIE J,MIRZA M,et al.Generative adversarial nets[EB/OL].(2014-06-10)[2022-08-19].https://doi.org/10.48550/arXiv.1406.2661.
[5] SUN K,XIAO B,LIU D,et al.Deep high-resolution representation learning for human pose estimation[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Press,2020:5686-5696.
[6] TOSHEV A,SZEGEDY C.DeepPose:human pose estimation via deep neural networks[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Press 2014:1653-1660.
[7] CARREIRA J,AGRAWAL P,FRAGKIADAKI K,et al.Human pose estimation with iterative error feedback[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Press,2016:4733-4742.
[8] SUN X,SHANG J X,LIANG S,et al.Compositional human pose regression[C]//IEEE International Conference on Computer Vision.Piscataway:IEEE Press,2017:2621-2630.
[9] LUVIZON D C,TABIA H,PICARD D.Human pose regression by combining indirect part detection and contextual information[J].Computers & graphics,2019,85:15-22.
[10] LI K,WANG S J,ZHANG X,et al.Pose recognition with cascade transformers[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Press,2021:1944-1953.
[11] RUDER S.An overview of multi-task learning in deep neural networks [EB/OL].(2017-06-15)[2022-08-19].https://doi.org/10.48550/arXiv.1706.05098.
[12] LI S J,LIU Z Q,CHAN A B.Heterogeneous multi-task learning for human pose estimation with deep convolutional neural network[C]//IEEE Conference on Computer Vision and Pattern Recognition Workshops.Piscataway:IEEE Press,2014:488-495.
[13] FAN X C,ZHENG K,LIN Y W,et al.Combining local appearance and holistic view:dual-source deep neural networks for human pose estimation[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Press,2015:1347-1355.
[14] LUVIZON D C,PICARD D,TABIA H.2D/3D pose estimation and action recognition using multitask deep learning[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Press,2018:5137-5146.
[15] MAO W,GE Y T,SHEN C H,et al.TFPose:direct human pose estimation with transformers[EB/OL].(2021-03-29)[2022-08-19].https://doi.org/10.48550/arXiv.2013.15320.
[16] MAO W,GE Y T,SHEN C H,et al.Poseur:direct human pose regression with transformers[EB/OL].(2022-01-19)[2022-08-19].https://doi.org/10.48550/arXiv.2201.07412.
[17] TOMPSON J,JAIN A,LECUN Y,et al.Joint training of a convolutional network and a graphical model for human pose estimation[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems.New York:ACM Press,2014:1799-1807.
[18] LIFSHITZ I,FETAYA E,ULLMAN S.Human pose estimation using deep consensus voting[C]//Proceedings of the 14th European Conference on Computer Vision.Cham:Springer International Publishing,2016:246-260.
[19] WEI S H,RAMAKRISHNA V,KANADE T,et al.Convolutional pose machines[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Press,2016:4724-4732.
[20] YANG W,LI S,OUYANG W L,et al.Learning feature Pyramids for human pose estimation[C]//IEEE International Conference on Computer Vision.Piscataway:IEEE Press,2017:1290-1299.
[21] NEWELL A,YANG K,DENG J.Stacked Hourglass Networks for Human Pose Estimation[C]//Proceedings of the 14th European Conference on Computer Vision.New York:ACM Press,2016:483-499.
[22] CHU X,YANG W,OUYANG W L,et al.Multi-context attention for human pose estimation[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Press,2017:5669-5678.
[23] ZHANG F,ZHU X T,YE M.Fast human pose estimation[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Press,2020:3512-3521.
[24] CHEN Y,SHEN C H,WEI X S,et al.Adversarial PoseNet:a structure-aware convolutional network for human pose estimation[C]//IEEE International Conference on Computer Vision.Piscataway:IEEE Press,2017:1221-1230.
[25] CHOU C J,CHIEN J T,CHEN H T.Self adversarial training for human pose estimation[C]//Asia-Pacific Signal and Information Processing Association Annual Summit and Conference.Piscataway:IEEE Press,2019:17-30.
[26] PENG X,TANG Z Q,YANG F,et al.Jointly optimize data augmentation and network training:adversarial data augmentation in human pose estimation[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Press,2018:2226-2234.
[27] JAIN A,TOMPSON J,LECUN Y,et al.MoDeep:a deep learning framework using motion features for human pose estimation[C]//Proceedings of the 12th Asian Conference on Computer Vision.Cham:Springer International Publishing,2015:302-315.
[28] PFISTER T,CHARLES J,ZISSERMAN A.Flowing ConvNets for human pose estimation in videos[C]//IEEE International Conference on Computer Vision.Piscataway:IEEE Press,2016:1913-1921.
[29] YANG W,OUYANG W L,LI H S,et al.End-to-end learning of deformable mixture of parts and deep convolutional neural networks for human pose estimation[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Press,2016:3073-3082.
[30] CHU X,OUYANG W L,LI H S,et al.Structured feature learning for pose estimation[EB/OL].(2016-03-30)[2022-08-19].https://doi.org/10.48550/arXiv.1603.09065.
[31] KE L,CHANG M C,QI H,et al.Multi-scale structure-aware network for human pose estimation [EB/OL].(2018-03-27)[2022-08-19].https://doi.org/10.48550/arXiv.1803.09894.
[32] TANG W,YU P,WU Y.Deeply learned compositional models for human pose estimation[C]//European Conference on Computer Vision.Cham:Springer International Publishing,2018:197-214.
[33] TANG W,WU Y.Does learning specific features for related parts help human pose estimation?[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Press,2020:1107-1116.
[34] XIAO B,WU H P,WEI Y C.Simple baselines for human pose estimation and tracking[C]//Proceedings of the 15th European Conference on Computer Vision.Cham:Springer International Publishing,2018:472-487.
[35] WANG J,LONG X,GAO Y,et al.Graph-PCNN:two stage human pose estimation with graph pose refinement[EB/OL].(2020-07-21)[2022-08-19].https://doi.org/10.48550/arXiv.2007.10599.
[36] CAI Y H,WANG Z C,LUO Z X,et al.Learning delicate local representations for multi-person pose estimation[EB/OL].(2020-03-09)[2022-08-19].https://doi.org/10.48550/arXiv.2003.04030.
[37] IQBAL U,GALL J.Multi-person pose estimation with local joint-to-person associations[M]//Lecture Notes in Computer Science.Cham:Springer International Publishing,2016:627-642.
[38] PAPANDREOU G,ZHU T,KANAZAWA N,et al.Towards accurate multi-person pose estimation in the wild[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE press,2017:3711-3719.
[39] CHEN Y L,WANG Z C,PENG Y X,et al.Cascaded pyramid network for multi-person pose estimation[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE press,2018:7103-7112.
[40] SU K,YU D D,XU Z Q,et al.Multi-person pose estimation with enhanced channel-wise and spatial information[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE press,2020:5667-5675.
[41] QIU L T,ZHANG X Y,LI Y R,et al.Peeking into occluded joints:a novel framework for crowd pose estimation[EB/OL].(2020-03-23)[2022-08-19].https://doi.org/10.48550/arXiv.2003.10506.
[42] RAFI U,DOERING A,LEIBE B,et al.Self-supervised keypoint correspondences for multi-person pose estimation and tracking in videos [EB/OL].(2020-04-27)[2022-08-19].https://doi.org/10.1007/978-3-030-58565-5_3.
[43] LI Y J,ZHANG S K,WANG Z C,et al.TokenPose:learning keypoint tokens for human pose estimation[EB/OL].(2021-04-08)[2022-08-19].https://doi.org/10.48550/arXiv.2104.03516.
[44] PISHCHULIN L,INSAFUTDINOV E,TANG S Y,et al.DeepCut:joint subset partition and labeling for multi person pose estimation[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Press,2016:4929-4937.
[45] INSAFUTDINOV E,PISHCHULIN L,ANDRES B,et al.DeeperCut:a deeper,stronger,and faster multi-person pose estimation model[C]//Proceedings of the 14th European Conference on Computer Vision.Cham:Springer International Publishing,2016:34-50.
[46] CAO Z,HIDALGO G,SIMON T,et al.OpenPose:realtime multi-person 2D pose estimation using part affinity fields[J].IEEE transactions on pattern analysis and machine intelligence,2021,43(1):172-186.
[47] OSOKIN D.Real-time 2D multi-person pose estimation on CPU:lightweight OpenPose [EB/OL].(2018-11-29)[2022-08-19].https://doi.org/10.48550/arXiv.1811.12004.
[48] ZHU X,JIANG Y.Multi-person pose estimation for PoseTrack with enhanced part affinity fields[C]//European Conference on Computer Vision.Berlin:Springer Press,2018:221-226.
[49] KREISS S,BERTONI L,ALAHI A.PifPaf:composite fields for human pose estimation[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Press,2020:11969-11978.
[50] NEWELL A,HUANG Z,DENG J.Associative embedding:end-to-end learning for joint detection and grouping[C]//Proceedings of the 31st Annual Conference on Neural Information Processing Systems.Piscataway:IEEE Press,2017:2274-2284.
[51] JIN S,LIU W T,XIE E Z,et al.Differentiable hierarchical graph grouping for multi-person pose estimation[EB/OL].(2020-07-23)[2022-08-19].https://doi.org/10.48550/arXiv.2007.11864.
[52] CHENG B W,XIAO B,WANG J D,et al.HigherHRNet:scale-aware representation learning for bottom-up human pose estimation[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Press,2020:5385-5394.
[53] YU C Q,XIAO B,GAO C X,et al.Lite-HRNet:a lightweight high-resolution network[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Press,2021:10435-10445.
[54] PAPANDREOU G,ZHU T,CHEN L C,et al.PersonLab:person pose estimation and instance segmentation with a bottom-up,part-based,geometric embedding model[C]//Proceedings of the 15th European Conference on Computer Vision.Cham:Springer International Publishing,2018:282-299.
[55] KOCABAS M,KARAGOZ S,AKBAS E.MultiPoseNet:fast multi-person pose estimation using pose residual network[EB/OL].(2018-07-01)[2022-08-19].https://doi.org/10.48550/arXiv.1807.04067.
[56] GENG Z G,SUN K,XIAO B,et al.Bottom-up human pose estimation via disentangled keypoint regression[EB/OL].(2021-04-21)[2022-08-19].https://doi.org/10.48550/arXiv.2104.02300.
[57] LUO Z,WANG Z,HUANG Y,et al.Rethinking the heatmap regression for bottom-up human pose estimation [EB/OL].(2020-12-30)[2022-08-19].https://doi.org/10.48550/arXiv.2012.15175.
[58] WANG Y H,LI M Y,CAI H,et al.Lite pose:efficient architecture design for 2D human pose estimation[EB/OL].(2022-05-03)[2022-08-19].https://doi.org/10.48550/arXiv.2205.01271.
[59] DONG J T,JIANG W,HUANG Q X,et al.Fast and robust multi-person 3D pose estimation from multiple views[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway:IEEE Press,2020:7784-7793.
基本信息:
DOI:10.13705/j.issn.1671-6841.2022334
中图分类号:TP391.41;TP18
引用信息:
[1]王珂,陈启腾,陈伟,等.基于深度学习的二维人体姿态估计综述[J].郑州大学学报(理学版),2024,56(04):11-20.DOI:10.13705/j.issn.1671-6841.2022334.
基金信息:
国家自然科学基金项目(52274160,51874300,52074305)