Kanglei Zhou
Multi-Task Spatial-Temporal Graph Auto-Encoder for Hand Motion Denoising
Zhou, Kanglei; Shum, Hubert P.H.; Li, Frederick W.B.; Liang, Xiaohui
Authors
Professor Hubert Shum hubert.shum@durham.ac.uk
Professor
Dr Frederick Li frederick.li@durham.ac.uk
Associate Professor
Xiaohui Liang
Abstract
In many human-computer interaction applications, fast and accurate hand tracking is necessary for an immersive experience. However, raw hand motion data can be flawed due to issues such as joint occlusions and high-frequency noise, hindering the interaction. Using only current motion for interaction can lead to lag, so predicting future movement is crucial for a faster response. Our solution is the Multi-task Spatial-Temporal Graph Auto-Encoder (Multi-STGAE), a model that accurately denoises and predicts hand motion by exploiting the inter-dependency of both tasks. The model ensures a stable and accurate prediction through denoising while maintaining motion dynamics to avoid over-smoothed motion and alleviate time delays through prediction. A gate mechanism is integrated to prevent negative transfer between tasks and further boost multi-task performance. Multi-STGAE also includes a spatial-temporal graph autoencoder block, which models hand structures and motion coherence through graph convolutional networks, reducing noise while preserving hand physiology. Additionally, we design a novel hand partition strategy and hand bone loss to improve natural hand motion generation. We validate the effectiveness of our proposed method by contributing two large-scale datasets with a data corruption algorithm based on two benchmark datasets. To evaluate the natural characteristics of the denoised and predicted hand motion, we propose two structural metrics. Experimental results show that our method outperforms the state-of-the-art, showcasing how the multitask framework enables mutual benefits between denoising and prediction.
Citation
Zhou, K., Shum, H. P., Li, F. W., & Liang, X. (2024). Multi-Task Spatial-Temporal Graph Auto-Encoder for Hand Motion Denoising. IEEE Transactions on Visualization and Computer Graphics, 30(10), 6754-6769. https://doi.org/10.1109/TVCG.2023.3337868
Journal Article Type | Article |
---|---|
Acceptance Date | Nov 27, 2023 |
Online Publication Date | Nov 30, 2023 |
Publication Date | 2024-10 |
Deposit Date | Nov 29, 2023 |
Publicly Available Date | Nov 30, 2023 |
Journal | IEEE Transactions on Visualization and Computer Graphics |
Print ISSN | 1077-2626 |
Electronic ISSN | 1941-0506 |
Publisher | Institute of Electrical and Electronics Engineers |
Peer Reviewed | Peer Reviewed |
Volume | 30 |
Issue | 10 |
Pages | 6754-6769 |
DOI | https://doi.org/10.1109/TVCG.2023.3337868 |
Public URL | https://durham-repository.worktribe.com/output/1962816 |
Files
Accepted Journal Article
(4.5 Mb)
PDF
Licence
http://creativecommons.org/licenses/by/4.0/
Copyright Statement
This accepted manuscript is licensed under the Creative Commons Attribution 4.0 licence. https://creativecommons.org/licenses/by/4.0/
You might also like
Adaptive Graph Learning from Spatial Information for Surgical Workflow Anticipation
(2024)
Journal Article
Neural-code PIFu: High-fidelity Single Image 3D Human Reconstruction via Neural Code Integration
(2024)
Presentation / Conference Contribution
From Category to Scenery: An End-to-End Framework for Multi-Person Human-Object Interaction Recognition in Videos
(2024)
Presentation / Conference Contribution
MAGR: Manifold-Aligned Graph Regularization for Continual Action Quality Assessment
(2024)
Presentation / Conference Contribution
Downloadable Citations
About Durham Research Online (DRO)
Administrator e-mail: dro.admin@durham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search