Muna Almushyti muna.i.almushyti@durham.ac.uk
PGR Student Doctor of Philosophy
STIT: Spatio-Temporal Interaction Transformers for Human-Object Interaction Recognition in Videos
Almushyti, Muna; Li, Frederick W.B.
Authors
Dr Frederick Li frederick.li@durham.ac.uk
Associate Professor
Abstract
Recognizing human-object interactions is challenging due to their spatio-temporal changes. We propose the SpatioTemporal Interaction Transformer-based (STIT) network to reason such changes. Specifically, spatial transformers learn humans and objects context at specific frame time. Temporal transformer then learns the relations at a higher level between spatial context representations at different time steps, capturing longterm dependencies across frames. We further investigate multiple hierarchy designs in learning human interactions. We achieved superior performance on Charades, Something-Something v1 and CAD-120 datasets, comparing to baseline models without learning human-object relations, or with prior graph-based networks. We also achieved state-of-the-art accuracy of 95.93% on CAD-120 dataset [1] by employing RGB data only.
Citation
Almushyti, M., & Li, F. W. (2022). STIT: Spatio-Temporal Interaction Transformers for Human-Object Interaction Recognition in Videos. . https://doi.org/10.1109/icpr56361.2022.9956030
Presentation Conference Type | Conference Paper (Published) |
---|---|
Conference Name | 2022 26th International Conference on Pattern Recognition (ICPR) |
Start Date | Aug 21, 2022 |
End Date | Aug 25, 2022 |
Acceptance Date | May 17, 2022 |
Publication Date | 2022-11 |
Deposit Date | Oct 31, 2022 |
Publicly Available Date | Nov 1, 2022 |
Publisher | Institute of Electrical and Electronics Engineers |
Pages | 3287-3294 |
DOI | https://doi.org/10.1109/icpr56361.2022.9956030 |
Public URL | https://durham-repository.worktribe.com/output/1135752 |
Related Public URLs | https://doi.org/10.1109/ICPR56361.2022.9956030 |
Additional Information | 21-25 Aug. 2022 |
Files
Accepted Conference Proceeding
(1.4 Mb)
PDF
Copyright Statement
© 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
You might also like
Distillation of human–object interaction contexts for action recognition
(2022)
Journal Article
Advances in Web-Based Learning - ICWL 2015
(-0001)
Book
Tackling Data Bias in Painting Classification with Style Transfer
(2023)
Presentation / Conference Contribution
Aesthetic Enhancement via Color Area and Location Awareness
(2022)
Presentation / Conference Contribution
STGAE: Spatial-Temporal Graph Auto-Encoder for Hand Motion Denoising
(2021)
Presentation / Conference Contribution
Downloadable Citations
About Durham Research Online (DRO)
Administrator e-mail: dro.admin@durham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search