Fg-T2M: Fine-Grained Text-Driven Human Motion Generation via Diffusion Model
Wang, Yin; Leng, Zhiying; Li, Frederick W. B.; Wu, Shun-Cheng; Liang, Xiaohui
Dr Frederick Li email@example.com
Text-driven human motion generation in computer vision is both significant and challenging. However, current methods are limited to producing either deterministic or imprecise motion sequences, failing to effectively control the temporal and spatial relationships required to conform to a given text description. In this work, we propose a fine-grained method for generating high-quality, conditional human motion sequences supporting precise text description. Our approach consists of two key components: 1) a linguistics-structure assisted module that constructs accurate and complete language feature to fully utilize text information; and 2) a context-aware progressive reasoning module that learns neighborhood and overall semantic linguistics features from shallow and deep graph neural networks to achieve a multi-step inference. Experiments show that our approach outperforms text-driven motion generation methods on HumanML3D and KIT test sets and generates better visually confirmed motion to the text conditions.
Wang, Y., Leng, Z., Li, F. W. B., Wu, S., & Liang, X. (in press). Fg-T2M: Fine-Grained Text-Driven Human Motion Generation via Diffusion Model.
|Conference Name||The International Conference on Computer Vision (ICCV) 2023|
|Start Date||Oct 2, 2023|
|End Date||Oct 6, 2023|
|Acceptance Date||Aug 11, 2023|
|Deposit Date||Sep 12, 2023|
|Publicly Available Date||Sep 30, 2023|
|Publisher||Institute of Electrical and Electronics Engineers|
This file is under embargo due to copyright reasons.
You might also like
A Mixed Reality Training System for Hand-Object Interaction in Simulated Microgravity Environments
C2SPoint: A classification-to-saliency network for point cloud saliency detection
Gamifying Experiential Learning Theory
IAACS: Image Aesthetic Assessment Through Color Composition And Space Formation