research-article

Action2Motion: Conditioned Generation of 3D Human Motions

Authors:

Li ChengAuthors Info & Claims

MM '20: Proceedings of the 28th ACM International Conference on Multimedia

Pages 2021 - 2029

https://doi.org/10.1145/3394171.3413635

Published: 12 October 2020 Publication History

Abstract

Action recognition is a relatively established task, where given an input sequence of human motion, the goal is to predict its action category. This paper, on the other hand, considers a relatively new problem, which could be thought of as an inverse of action recognition: given a prescribed action type, we aim to generate plausible human motion sequences in 3D. Importantly, the set of generated motions are expected to maintain its diversity to be able to explore the entire action-conditioned motion space; meanwhile, each sampled sequence faithfully resembles a natural human body articulation dynamics. Motivated by these objectives, we follow the physics law of human kinematics by adopting the Lie Algebra theory to represent the natural human motions; we also propose a temporal Variational Auto-Encoder (VAE) that encourages a diverse sampling of the motion space. A new 3D human motion dataset, HumanAct12, is also constructed. Empirical experiments over three distinct human motion datasets (including ours) demonstrate the effectiveness of our approach.

Supplementary Material

MP4 File (3394171.3413635.mp4)

We proposed a VAE based network for natural and diverse human motions generation only conditioned on action types. VAE tapped into a RNN architecture handles with the stochastic temporal pose generation, and Lie algebra naturally represent human poses. Our method outperforms the comparison baselines over three datasets. We also release a new action-annotated human motion dataset.

Download
6.21 MB

References

[1]

Hyemin Ahn, Timothy Ha, Yunho Choi, Hwiyeon Yoo, and Songhwai Oh. 2018. Text2action: Generative adversarial synthesis from language to action. In Proceedings of IEEE International Conference on Robotics and Automation (ICRA). IEEE, 5915--5920.

[2]

Chaitanya Ahuja and Louis-Philippe Morency. 2019. Language2Pose: Natural Language Grounded Pose Forecasting. In International Conference on 3D Vision (3DV). IEEE, 719--728.

[3]

Haoye Cai, Chunyan Bai, Yu-Wing Tai, and Chi-Keung Tang. 2018. Deep video generation, prediction and completion of human action sequences. In Proceedings of the European Conference on Computer Vision (ECCV). 366--382.

[4]

Alexandros Andre Chaaraoui, Jos� Ram�n Padilla-L�pez, Pau Climent-P�rez, and Francisco Fl�rez-Revuelta. 2014. Evolutionary joint selection to improve human action recognition with RGB-D devices. Expert systems with applications, Vol. 41, 3 (2014), 786--794.

[5]

Carnegie Mellon University. 2003. Carnegie Mellon University graphics lab motion capture database. (2003).

[6]

Emily Denton and Rob Fergus. 2018. Stochastic Video Generation with a Learned Prior. In International Conference on Machine Learning (ICML). 1174--1183.

[7]

Dariu M Gavrila, Larry S Davis, et almbox. 1995. Towards 3-d model-based tracking and recognition of human movement: a multi-view approach. In International workshop on automatic face-and gesture-recognition. Citeseer, 272--277.

[8]

Liang-Yan Gui, Yu-Xiong Wang, Xiaodan Liang, and Jos� MF Moura. 2018. Adversarial geometry-aware human motion prediction. In Proceedings of the European Conference on Computer Vision (ECCV). 786--803.

Digital Library

[9]

Fei Han, Brian Reily, William Hoff, and Hao Zhang. 2017. Space-time representation of people based on 3D skeletal data: A review. Computer Vision and Image Understanding, Vol. 158 (2017), 85--105.

Digital Library

[10]

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in neural information processing systems. 6626--6637.

[11]

Zhiwu Huang, Chengde Wan, Thomas Probst, and Luc Van Gool. 2017. Deep learning on lie groups for skeleton-based action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). 6099--6108.

[12]

Mohamed E Hussein, Marwan Torki, Mohammad A Gowayyed, and Motaz El-Saban. 2013. Human action recognition using a temporal hierarchy of covariance descriptors on 3d joint locations. In Twenty-Third International Joint Conference on Artificial Intelligence (IJCAI).

Digital Library

[13]

Yunji Kim, Seonghyeon Nam, In Cho, and Seon Joo Kim. 2019. Unsupervised Keypoint Learning for Guiding Class-Conditional Video Prediction. In Advances in Neural Information Processing Systems. 3809--3819.

[14]

Diederik P Kingma and Max Welling. 2014. Auto-encoding variational bayes. In International Conference on Learning Representations (ICLR).

[15]

Muhammed Kocabas, Nikos Athanasiou, and Michael J. Black. 2020. VIBE: Video Inference for Human Body Pose and Shape Estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).

[16]

Hsin-Ying Lee, Xiaodong Yang, Ming-Yu Liu, Ting-Chun Wang, Yu-Ding Lu, Ming-Hsuan Yang, and Jan Kautz. 2019. Dancing to Music. In Advances in Neural Information Processing Systems. 3581--3591.

[17]

Wanqing Li, Zhengyou Zhang, and Zicheng Liu. 2010. Action recognition based on a bag of 3d points. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops. IEEE, 9--14.

[18]

Angela S Lin, Lemeng Wu, Rodolfo Corona, Kevin Tai, Qixing Huang, and Raymond J Mooney. 2018. generating animated videos of human activities from natural language descriptions. In Proceedings of the Visually Grounded Interaction and Language Workshop at NeurIPS 2018.

[19]

Jun Liu, Amir Shahroudy, Mauricio Lisboa Perez, Gang Wang, Ling-Yu Duan, and Alex Kot Chichung. 2019 a. NTU RGBD 120: A Large-Scale Benchmark for 3D Human Activity Understanding. IEEE transactions on pattern analysis and machine intelligence (2019).

[20]

Zhenguang Liu, Shuang Wu, Shuyuan Jin, Qi Liu, Shijian Lu, Roger Zimmermann, and Li Cheng. 2019 b. Towards natural and accurate future motion prediction of humans and animals. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 10004--10012.

[21]

Meinard M�ller. 2007. Information retrieval for music and motion. Vol. 2. Springer.

[22]

Meinard M�ller, Tido R�der, Michael Clausen, Bernhard Eberhardt, Bj�rn Kr�ger, and Andreas Weber. [n.d.]. Mocap database hdm05. ([n.,d.]).

[23]

Richard M Murray, Zexiang Li, and S Shankar Sastry. 1994. A mathematical introduction to robotic manipulation. CRC press.

[24]

Matthias Plappert, Christian Mandery, and Tamim Asfour. 2018. Learning a bidirectional mapping between human whole-body motion and natural language using deep recurrent neural networks. Robotics and Autonomous Systems, Vol. 109 (2018), 13--26.

[25]

Eli Shlizerman, Lucio Dery, Hayden Schoen, and Ira Kemelmacher-Shlizerman. 2018. Audio to body dynamics. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 7574--7583.

[26]

Stephanie Stoll, Necati Cihan Camgoz, Simon Hadfield, and Richard Bowden. 2020. Text2Sign: Towards Sign Language Production Using Neural Machine Translation and Generative Adversarial Networks. International Journal of Computer Vision, Vol. 128 (2020), 891--908.

Digital Library

[27]

Supasorn Suwajanakorn, Steven M Seitz, and Ira Kemelmacher-Shlizerman. 2015. What makes tom hanks look like tom hanks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). 3952--3960.

Digital Library

[28]

Kenta Takeuchi, Dai Hasegawa, Shinichi Shirakawa, Naoshi Kaneko, Hiroshi Sakuta, and Kazuhiko Sumi. 2017. Speech-to-gesture generation: A challenge in deep learning approach with bi-directional LSTM. In Proceedings of the 5th International Conference on Human Agent Interaction. 365--369.

Digital Library

[29]

Taoran Tang, Jia Jia, and Hanyang Mao. 2018. Dance with melody: An lstm-autoencoder approach to music-oriented dance synthesis. In Proceedings of the 26th ACM international conference on Multimedia (ACM MM). 1598--1606.

Digital Library

[30]

Sergey Tulyakov, Ming-Yu Liu, Xiaodong Yang, and Jan Kautz. 2018. Mocogan: Decomposing motion and content for video generation. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). 1526--1535.

[31]

Raviteja Vemulapalli, Felipe Arrate, and Rama Chellappa. 2014. Human action recognition by representing 3d skeletons as points in a lie group. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). 588--595.

Digital Library

[32]

Jiang Wang, Zicheng Liu, Ying Wu, and Junsong Yuan. 2012. Mining actionlet ensemble for action recognition with depth cameras. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 1290--1297.

[33]

Lu Xia, Chia-Chih Chen, and Jake K Aggarwal. 2012. View invariant human action recognition using histograms of 3d joints. In 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. IEEE, 20--27.

[34]

Chi Xu, Lakshmi Narasimhan Govindarajan, Yu Zhang, and Li Cheng. 2017. Lie-X: Depth image based articulated object pose estimation, tracking, and action recognition on lie groups. International Journal of Computer Vision, Vol. 123, 3 (2017), 454--478.

Digital Library

[35]

Yaser Yacoob and Michael J Black. 1999. Parameterized modeling and recognition of activities. Computer Vision and Image Understanding, Vol. 73, 2 (1999), 232--247.

Digital Library

[36]

Xinchen Yan, Akash Rastogi, Ruben Villegas, Kalyan Sunkavalli, Eli Shechtman, Sunil Hadap, Ersin Yumer, and Honglak Lee. 2018. Mt-vae: Learning motion transformations to generate multimodal human dynamics. In Proceedings of the European Conference on Computer Vision (ECCV). 265--281.

Digital Library

[37]

Ceyuan Yang, Zhe Wang, Xinge Zhu, Chen Huang, Jianping Shi, and Dahua Lin. 2018. Pose guided human video generation. In Proceedings of the European Conference on Computer Vision (ECCV). 201--216.

[38]

Shihao Zou, Xinxin Zuo, Yiming Qian, Sen Wang, Chi Xu, Minglun Gong, and Li Cheng. 2020 a. 3D Human Shape Reconstruction from a Polarization Image. In Proceedings of the European Conference on Computer Vision (ECCV).

Digital Library

[39]

Shihao Zou, Xinxin Zuo, Yiming Qian, Sen Wang, Chi Xu, Minglun Gong, and Li Cheng. 2020 b. Polarization Human Shape and Pose Dataset.

Cited By

Li ZWang YDu XWang CKoch RLiu M(2024)ASMNet: Action and Style-Conditioned Motion Generative Network for 3D Human Motion GenerationCyborg and Bionic Systems10.34133/cbsystems.00905Online publication date: 6-Feb-2024
https://doi.org/10.34133/cbsystems.0090
Jian LJun YLiyan WYonggui W(2024)Incorporating variational auto-encoder networks for text-driven generation of 3D motion human bodyJournal of Image and Graphics10.11834/jig.23029129:5(1434-1446)Online publication date: 2024
https://doi.org/10.11834/jig.230291
Liu QNiu ZLu KDong KXue JQin XWang JZhu HSong JLiu WZhang DHuang WWang X(2024)AdaptControl: Adaptive Human Motion Control and Generation via User Prompt and Spatial Trajectory GuidanceProceedings of the 5th International Workshop on Human-centric Multimedia Analysis10.1145/3688865.3689476(13-22)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3688865.3689476
Show More Cited By

Index Terms

Action2Motion: Conditioned Generation of 3D Human Motions
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Activity recognition and understanding

Recommendations

Action-conditioned On-demand Motion Generation
MM '22: Proceedings of the 30th ACM International Conference on Multimedia

We propose a novel framework, On-Demand MOtion Generation (ODMO), for generating realistic and diverse long-term 3D human motion sequences conditioned only on action types with an additional capability of customization. ODMO shows improvements over SOTA ...
Generation and manipulation of H-Anim CAESAR scan bodies
Web3D '07: Proceedings of the twelfth international conference on 3D web technology

In this paper we present a procedure to create animated human models, compliant with the H-Anim standard, from 3D CAESAR scan bodies, which were captured using a whole body scan device. We also present a VRML prototype of an "Animated CAESAR Viewer" to ...
Anime hair motion design from animation database
CyberGames '06: Proceedings of the 2006 international conference on Game research and development

This paper describes a new method to animate anime-like hair motion that allows users to use existing cel character animation sequences. We demonstrate how to create cartoon hair animation accentuated in anime-like motions. The novelty of this approach ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '20: Proceedings of the 28th ACM International Conference on Multimedia

October 2020

4889 pages

ISBN:9781450379885

DOI:10.1145/3394171

General Chairs:
Chang Wen Chen
Chinese University of Hong Kong, Shenzhen, China
,
Rita Cucchiara
UNIMORE, Italy
,
Xian-Sheng Hua
Alibaba Group, China
,
Program Chairs:
Guo-Jun Qi
Futurewei Technologies, USA
,
Elisa Ricci
UNITN & Fondazione Bruno Kessler, Italy
,
Zhengyou Zhang
Tencent, China
,
Roger Zimmermann
National University of Singapore, Singapore

Copyright � 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 October 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

NSERC Discovery Grant

Conference

MM '20

Sponsor:

SIGMM

MM '20: The 28th ACM International Conference on Multimedia

October 12 - 16, 2020

WA, Seattle, USA

Acceptance Rates

Overall Acceptance Rate 995 of 4,171 submissions, 24%

Upcoming Conference

MM '24

Sponsor:
sigmm

The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

151
Total Citations
View Citations
996
Total Downloads

Downloads (Last 12 months)373
Downloads (Last 6 weeks)45

Reflects downloads up to 19 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Li ZWang YDu XWang CKoch RLiu M(2024)ASMNet: Action and Style-Conditioned Motion Generative Network for 3D Human Motion GenerationCyborg and Bionic Systems10.34133/cbsystems.00905Online publication date: 6-Feb-2024
https://doi.org/10.34133/cbsystems.0090
Jian LJun YLiyan WYonggui W(2024)Incorporating variational auto-encoder networks for text-driven generation of 3D motion human bodyJournal of Image and Graphics10.11834/jig.23029129:5(1434-1446)Online publication date: 2024
https://doi.org/10.11834/jig.230291
Liu QNiu ZLu KDong KXue JQin XWang JZhu HSong JLiu WZhang DHuang WWang X(2024)AdaptControl: Adaptive Human Motion Control and Generation via User Prompt and Spatial Trajectory GuidanceProceedings of the 5th International Workshop on Human-centric Multimedia Analysis10.1145/3688865.3689476(13-22)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3688865.3689476
Peng SLadenheim KShrestha SFerm�ller C(2024)Generation of Novel Fall Animation with Configurable AttributesProceedings of the 9th International Conference on Movement and Computing10.1145/3658852.3659087(1-6)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3658852.3659087
Cohan STevet GReda DPeng Xvan de Panne M(2024)Flexible Motion In-betweening with Diffusion ModelsACM SIGGRAPH 2024 Conference Papers10.1145/3641519.3657414(1-9)Online publication date: 13-Jul-2024
https://dl.acm.org/doi/10.1145/3641519.3657414
Yang YShi HZhang HHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Hierarchical Semantics Alignment for 3D Human Motion RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657804(1083-1092)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657804
Po RYifan WGolyanik VAberman KBarron JBermano AChan EDekel THolynski AKanazawa ALiu CLiu LMildenhall BNie�ner MOmmer BTheobalt CWonka PWetzstein G(2024)State of the Art on Diffusion Models for Visual ComputingComputer Graphics Forum10.1111/cgf.1506343:2Online publication date: 30-Apr-2024
https://doi.org/10.1111/cgf.15063
Ribeiro-Gomes JCai TMilacski ZWu CPrakash ATakagi SAubel AKim DBernardino ADe La Torre F(2024)MotionGPT: Human Motion Synthesis with Improved Diversity and Realism via GPT-3 Prompting2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00499(5058-5068)Online publication date: 3-Jan-2024
https://doi.org/10.1109/WACV57701.2024.00499
Fukushi KNozaki YNishihara KNakahara K(2024)Few-shot generative model for skeleton-based human action synthesis using cross-domain adversarial learning2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00390(3934-3943)Online publication date: 3-Jan-2024
https://doi.org/10.1109/WACV57701.2024.00390
Normoyle ASedoc JDurupinar F(2024)Using LLMs to Animate Interactive Story Characters with Emotions and Personality2024 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW)10.1109/VRW62533.2024.00124(632-635)Online publication date: 16-Mar-2024
https://doi.org/10.1109/VRW62533.2024.00124
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents