Multi-Task Spatial-Temporal Graph Auto-Encoder for Hand Motion Denoising

Zhou, Kanglei; Shum, Hubert P.H.; Li, Frederick W.B.; Liang, Xiaohui

doi:10.1109/TVCG.2023.3337868

Multi-Task Spatial-Temporal Graph Auto-Encoder for Hand Motion Denoising

Zhou, Kanglei; Shum, Hubert P.H.; Li, Frederick W.B.; Liang, Xiaohui

Authors

Kanglei Zhou

Professor Hubert Shum hubert.shum@durham.ac.uk
Professor

Dr Frederick Li frederick.li@durham.ac.uk
Associate Professor

Xiaohui Liang

Abstract

In many human-computer interaction applications, fast and accurate hand tracking is necessary for an immersive experience. However, raw hand motion data can be flawed due to issues such as joint occlusions and high-frequency noise, hindering the interaction. Using only current motion for interaction can lead to lag, so predicting future movement is crucial for a faster response. Our solution is the Multi-task Spatial-Temporal Graph Auto-Encoder (Multi-STGAE), a model that accurately denoises and predicts hand motion by exploiting the inter-dependency of both tasks. The model ensures a stable and accurate prediction through denoising while maintaining motion dynamics to avoid over-smoothed motion and alleviate time delays through prediction. A gate mechanism is integrated to prevent negative transfer between tasks and further boost multi-task performance. Multi-STGAE also includes a spatial-temporal graph autoencoder block, which models hand structures and motion coherence through graph convolutional networks, reducing noise while preserving hand physiology. Additionally, we design a novel hand partition strategy and hand bone loss to improve natural hand motion generation. We validate the effectiveness of our proposed method by contributing two large-scale datasets with a data corruption algorithm based on two benchmark datasets. To evaluate the natural characteristics of the denoised and predicted hand motion, we propose two structural metrics. Experimental results show that our method outperforms the state-of-the-art, showcasing how the multitask framework enables mutual benefits between denoising and prediction.

Citation

Zhou, K., Shum, H. P., Li, F. W., & Liang, X. (2024). Multi-Task Spatial-Temporal Graph Auto-Encoder for Hand Motion Denoising. IEEE Transactions on Visualization and Computer Graphics, 30(10), 6754-6769. https://doi.org/10.1109/TVCG.2023.3337868

Journal Article Type	Article
Acceptance Date	Nov 27, 2023
Online Publication Date	Nov 30, 2023
Publication Date	2024-10
Deposit Date	Nov 29, 2023
Publicly Available Date	Nov 30, 2023
Journal	IEEE Transactions on Visualization and Computer Graphics
Print ISSN	1077-2626
Electronic ISSN	1941-0506
Publisher	Institute of Electrical and Electronics Engineers
Peer Reviewed	Peer Reviewed
Volume	30
Issue	10
Pages	6754-6769
DOI	https://doi.org/10.1109/TVCG.2023.3337868
Public URL	https://durham-repository.worktribe.com/output/1962816

Files

Accepted Journal Article (4.5 Mb)
PDF

Licence
http://creativecommons.org/licenses/by/4.0/

Copyright Statement
This accepted manuscript is licensed under the Creative Commons Attribution 4.0 licence. https://creativecommons.org/licenses/by/4.0/