Manli Zhu
Geometric Features Enhanced Human-Object Interaction Detection
Zhu, Manli; Ho, Edmond S. L.; Chen, Shuang; Yang, Longzhi; Shum, Hubert P. H.
Authors
Abstract
Cameras are essential vision instruments to capture images for pattern detection and measurement. Human–object interaction (HOI) detection is one of the most popular pattern detection approaches for captured human-centric visual scenes. Recently, Transformer-based models have become the dominant approach for HOI detection due to their advanced network architectures and, thus, promising results. However, most of them follow the one-stage design of vanilla Transformer, leaving rich geometric priors underexploited and leading to compromised performance, especially when occlusion occurs. Given that geometric features tend to outperform visual ones in occluded scenarios and offer information that complements visual cues, we propose a novel end-to-end Transformer-style HOI detection model, i.e., geometric features enhanced HOI detector (GeoHOI). One key part of the model is a new unified self-supervised keypoint learning method named UniPointNet that bridges the gap of consistent keypoint representation across diverse object categories, including humans. GeoHOI effectively upgrades a Transformer-based HOI detector benefiting from the keypoints similarities measuring the likelihood of HOIs and local keypoint patches to enhance interaction query representation, so as to boost HOI predictions. Extensive experiments show that the proposed method outperforms the state-of-the-art models on V-COCO and achieves competitive performance on HICO-DET. Case study results on the postdisaster rescue with vision-based instruments showcase the applicability of the proposed GeoHOI in real-world applications.
Citation
Zhu, M., Ho, E. S. L., Chen, S., Yang, L., & Shum, H. P. H. (2024). Geometric Features Enhanced Human-Object Interaction Detection. IEEE Transactions on Instrumentation and Measurement, 73, Article 5026014. https://doi.org/10.1109/TIM.2024.3427800
Journal Article Type | Article |
---|---|
Acceptance Date | Jun 9, 2024 |
Online Publication Date | Jul 16, 2024 |
Publication Date | 2024 |
Deposit Date | Jun 14, 2024 |
Publicly Available Date | Jul 16, 2024 |
Journal | IEEE Transactions on Instrumentation and Measurement |
Print ISSN | 0018-9456 |
Electronic ISSN | 1557-9662 |
Publisher | Institute of Electrical and Electronics Engineers |
Peer Reviewed | Peer Reviewed |
Volume | 73 |
Article Number | 5026014 |
DOI | https://doi.org/10.1109/TIM.2024.3427800 |
Public URL | https://durham-repository.worktribe.com/output/2483265 |
Files
Accepted Journal Article
(13.8 Mb)
PDF
Licence
http://creativecommons.org/licenses/by/4.0/
Copyright Statement
This accepted manuscript is licensed under the Creative Commons Attribution 4.0 licence. https://creativecommons.org/licenses/by/4.0/
You might also like
Adaptive Graph Learning from Spatial Information for Surgical Workflow Anticipation
(2024)
Journal Article
MAGR: Manifold-Aligned Graph Regularization for Continual Action Quality Assessment
(2024)
Presentation / Conference Contribution
SEM-Net: Efficient Pixel Modelling for Image Inpainting with Spatially Enhanced SSM
(2024)
Presentation / Conference Contribution
Chatbots and Art Critique: A Comparative Study of Chatbot and Human Experts in Traditional Chinese Painting Education
(2024)
Presentation / Conference Contribution
Downloadable Citations
About Durham Research Online (DRO)
Administrator e-mail: dro.admin@durham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search