Muna Almushyti muna.i.almushyti@durham.ac.uk
PGR Student Doctor of Philosophy
Distillation of human–object interaction contexts for action recognition
Almushyti, Muna; Li, Frederick W.B.
Authors
Dr Frederick Li frederick.li@durham.ac.uk
Associate Professor
Abstract
Modeling spatial-temporal relations is imperative for recognizing human actions, especially when a human is interacting with objects, while multiple objects appear around the human differently over time. Most existing action recognition models focus on learning overall visual cues of a scene but disregard a holistic view of human–object relationships and interactions, that is, how a human interacts with respect to short-term task for completion and long-term goal. We therefore argue to improve human action recognition by exploiting both the local and global contexts of human–object interactions (HOIs). In this paper, we propose the Global-Local Interaction Distillation Network (GLIDN), learning human and object interactions through space and time via knowledge distillation for holistic HOI understanding. GLIDN encodes humans and objects into graph nodes and learns local and global relations via graph attention network. The local context graphs learn the relation between humans and objects at a frame level by capturing their co-occurrence at a specific time step. The global relation graph is constructed based on the video-level of human and object interactions, identifying their long-term relations throughout a video sequence. We also investigate how knowledge from these graphs can be distilled to their counterparts for improving HOI recognition. Finally, we evaluate our model by conducting comprehensive experiments on two datasets including Charades and CAD-120. Our method outperforms the baselines and counterpart approaches.
Citation
Almushyti, M., & Li, F. W. (2022). Distillation of human–object interaction contexts for action recognition. Computer Animation and Virtual Worlds, 33(5), Article e2107. https://doi.org/10.1002/cav.2107
Journal Article Type | Article |
---|---|
Acceptance Date | Jul 3, 2022 |
Online Publication Date | Aug 10, 2022 |
Publication Date | Oct 11, 2022 |
Deposit Date | Oct 12, 2022 |
Publicly Available Date | Oct 12, 2022 |
Journal | Computer Animation and Virtual Worlds |
Print ISSN | 1546-4261 |
Electronic ISSN | 1546-427X |
Publisher | Wiley |
Peer Reviewed | Peer Reviewed |
Volume | 33 |
Issue | 5 |
Article Number | e2107 |
DOI | https://doi.org/10.1002/cav.2107 |
Public URL | https://durham-repository.worktribe.com/output/1189021 |
Files
Published Journal Article
(2.3 Mb)
PDF
Publisher Licence URL
http://creativecommons.org/licenses/by-nc-nd/4.0/
Copyright Statement
© 2022 The Authors. Computer Animation and Virtual Worlds published by John Wiley & Sons Ltd.
This is an open access article under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non-commercial and no modifications or adaptations are made.
You might also like
STIT: Spatio-Temporal Interaction Transformers for Human-Object Interaction Recognition in Videos
(2022)
Presentation / Conference Contribution
A Differential Diffusion Theory for Participating Media
(2023)
Journal Article
An end-to-end dynamic point cloud geometry compression in latent space
(2023)
Journal Article
IAACS: Image Aesthetic Assessment Through Color Composition And Space Formation
(2023)
Journal Article
Downloadable Citations
About Durham Research Online (DRO)
Administrator e-mail: dro.admin@durham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search