Skip to main content

Research Repository

Advanced Search

Outputs (39)

From Category to Scenery: An End-to-End Framework for Multi-Person Human-Object Interaction Recognition in Videos (2024)
Presentation / Conference Contribution
Qiao, T., Li, R., Li, F. W. B., & Shum, H. P. H. (2024, December). From Category to Scenery: An End-to-End Framework for Multi-Person Human-Object Interaction Recognition in Videos. Presented at Proceedings of the 2024 International Conference on Pattern Recognition, Kolkata, India, 2024., Kolkata, India

Video-based Human-Object Interaction (HOI) recognition explores the intricate dynamics between humans and objects, which are essential for a comprehensive understanding of human behavior and intentions. While previous work has made significant stride... Read More about From Category to Scenery: An End-to-End Framework for Multi-Person Human-Object Interaction Recognition in Videos.

Self-Regulated Sample Diversity in Large Language Models (2024)
Presentation / Conference Contribution
Liu, M., Frawley, J., Wyer, S., Shum, H. P. H., Uckelman, S. L., Black, S., & Willcocks, C. G. (2024). Self-Regulated Sample Diversity in Large Language Models. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics (1891–1899)

U3DS3 : Unsupervised 3D Semantic Scene Segmentation (2024)
Presentation / Conference Contribution
Liu, J., Yu, Z., Breckon, T. P., & Shum, H. P. H. (2024). U3DS3 : Unsupervised 3D Semantic Scene Segmentation. In 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (3747-3756). https://doi.org/10.1109/WACV57701.2024.00372

Contemporary point cloud segmentation approaches largely rely on richly annotated 3D training data. However , it is both time-consuming and challenging to obtain consistently accurate annotations for such 3D scene data. Moreover, there is still a lac... Read More about U3DS3 : Unsupervised 3D Semantic Scene Segmentation.

Two-Person Interaction Augmentation with Skeleton Priors (2024)
Presentation / Conference Contribution
Li, B., Ho, E. S. L., Shum, H. P. H., & Wang, H. (2024, June). Two-Person Interaction Augmentation with Skeleton Priors. Presented at 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, Washington

A Virtual Reality Framework for Human-Driver Interaction Research: Safe and Cost-Effective Data Collection (2024)
Presentation / Conference Contribution
Crosato, L., Wei, C., Ho, E. S. L., Shum, H. P. H., & Sun, Y. (2024). A Virtual Reality Framework for Human-Driver Interaction Research: Safe and Cost-Effective Data Collection. In HRI '24: Proceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction (167-174). https://doi.org/10.1145/3610977.3634923

The advancement of automated driving technology has led to new challenges in the interaction between automated vehicles and human road users. However, there is currently no complete theory that explains how human road users interact with vehicles, an... Read More about A Virtual Reality Framework for Human-Driver Interaction Research: Safe and Cost-Effective Data Collection.

Unaligned 2D to 3D Translation with Conditional Vector-Quantized Code Diffusion using Transformers (2023)
Presentation / Conference Contribution
Corona-Figueroa, A., Bond-Taylor, S., Bhowmik, N., Gaus, Y. F. A., Breckon, T. P., Shum, H. P., & Willcocks, C. G. (2023). Unaligned 2D to 3D Translation with Conditional Vector-Quantized Code Diffusion using Transformers. In ICCV '23: Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision. https://doi.org/10.1109/ICCV51070.2023.01341

Generating 3D images of complex objects conditionally from a few 2D views is a difficult synthesis problem, compounded by issues such as domain gap and geometric misalignment. For instance, a unified framework such as Generative Adversarial Networks... Read More about Unaligned 2D to 3D Translation with Conditional Vector-Quantized Code Diffusion using Transformers.

Hard No-Box Adversarial Attack on Skeleton-Based Human Action Recognition with Skeleton-Motion-Informed Gradient (2023)
Presentation / Conference Contribution
Lu, Z., Wang, H., Chang, Z., Yang, G., & Shum, H. P. (2023). Hard No-Box Adversarial Attack on Skeleton-Based Human Action Recognition with Skeleton-Motion-Informed Gradient. . https://doi.org/10.1109/ICCV51070.2023.00424

Recently, methods for skeleton-based human activity recognition have been shown to be vulnerable to adversarial attacks. However, these attack methods require either the full knowledge of the victim (i.e. white-box attacks), access to training data (... Read More about Hard No-Box Adversarial Attack on Skeleton-Based Human Action Recognition with Skeleton-Motion-Informed Gradient.

Unifying Human Motion Synthesis and Style Transfer with Denoising Diffusion Probabilistic Models (2023)
Presentation / Conference Contribution
Chang, Z., Findlay, E. J., Zhang, H., & Shum, H. P. (2023). Unifying Human Motion Synthesis and Style Transfer with Denoising Diffusion Probabilistic Models. In Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2023) - GRAPP (64-74). https://doi.org/10.5220/0011631000003417

Generating realistic motions for digital humans is a core but challenging part of computer animations and games, as human motions are both diverse in content and rich in styles. While the latest deep learning approaches have made significant advancem... Read More about Unifying Human Motion Synthesis and Style Transfer with Denoising Diffusion Probabilistic Models.

Tackling Data Bias in Painting Classification with Style Transfer (2023)
Presentation / Conference Contribution
Vijendran, M., Li, F. W., & Shum, H. P. (2023). Tackling Data Bias in Painting Classification with Style Transfer. In Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 5 VISAPP: VISAPP (250-261). https://doi.org/10.5220/0011776600003417

It is difficult to train classifiers on paintings collections due to model bias from domain gaps and data bias from the uneven distribution of artistic styles. Previous techniques like data distillation, traditional data augmentation and style transf... Read More about Tackling Data Bias in Painting Classification with Style Transfer.

A Mixed Reality Training System for Hand-Object Interaction in Simulated Microgravity Environments (2023)
Presentation / Conference Contribution
Zhou, K., Chen, C., Ma, Y., Leng, Z., Shum, H. P., Li, F. W., & Liang, X. (2023). A Mixed Reality Training System for Hand-Object Interaction in Simulated Microgravity Environments. In 2023 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). https://doi.org/10.1109/ISMAR59233.2023.00031

As human exploration of space continues to progress, the use of Mixed Reality (MR) for simulating microgravity environments and facilitating training in hand-object interaction holds immense practical significance. However, hand-object interaction in... Read More about A Mixed Reality Training System for Hand-Object Interaction in Simulated Microgravity Environments.

Enhancing Perception and Immersion in Pre-Captured Environments through Learning-Based Eye Height Adaptation (2023)
Presentation / Conference Contribution
Feng, Q., Shum, H. P., & Morishima, S. (2023). Enhancing Perception and Immersion in Pre-Captured Environments through Learning-Based Eye Height Adaptation. In 2023 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). https://doi.org/10.1109/ISMAR59233.2023.00055

Pre-captured immersive environments using omnidirectional cameras provide a wide range of virtual reality applications. Previous research has shown that manipulating the eye height in egocentric virtual environments can significantly affect distance... Read More about Enhancing Perception and Immersion in Pre-Captured Environments through Learning-Based Eye Height Adaptation.

Correlation-Distance Graph Learning for Treatment Response Prediction from rs-fMRI (2023)
Presentation / Conference Contribution
Zhang, X., Zheng, S., Shum, H. P., Zhang, H., Song, N., Song, M., & Jia, H. (2023). Correlation-Distance Graph Learning for Treatment Response Prediction from rs-fMRI. In Neural Information Processing 30th International Conference, ICONIP 2023, Changsha, China, November 20–23, 2023, Proceedings, Part IX (298-312). https://doi.org/10.1007/978-981-99-8138-0_24

Resting-state fMRI (rs-fMRI) functional connectivity (FC) analysis provides valuable insights into the relationships between different brain regions and their potential implications for neurological or psychiatric disorders. However, specific design... Read More about Correlation-Distance Graph Learning for Treatment Response Prediction from rs-fMRI.

Less is More: Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation (2023)
Presentation / Conference Contribution
Li, L., Shum, H. P., & Breckon, T. P. (2023). Less is More: Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/CVPR52729.2023.00903

Whilst the availability of 3D LiDAR point cloud data has significantly grown in recent years, annotation remains expensive and time-consuming, leading to a demand for semisupervised semantic segmentation methods with application domains such as auton... Read More about Less is More: Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation.

Region-based Appearance and Flow Characteristics for Anomaly Detection in Infrared Surveillance Imagery (2023)
Presentation / Conference Contribution
Gaus, Y., Bhowmik, N., Issac-Medina, B., Atapour-Abarghouei, A., Shum, H., & Breckon, T. (2023). Region-based Appearance and Flow Characteristics for Anomaly Detection in Infrared Surveillance Imagery. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). https://doi.org/10.1109/CVPRW59228.2023.00301

Anomaly detection is a classical problem within automated visual surveillance, namely the determination of the normal from the abnormal when operational data availability is highly biased towards one class (normal) due to both insufficient sample siz... Read More about Region-based Appearance and Flow Characteristics for Anomaly Detection in Infrared Surveillance Imagery.

Denoising Diffusion Probabilistic Models for Styled Walking Synthesis (2022)
Presentation / Conference Contribution
Findlay, E., Zhang, H., Chang, Z., & Shum, H. P. (2022). Denoising Diffusion Probabilistic Models for Styled Walking Synthesis. . https://doi.org/10.1145/3561975

Generating realistic motions for digital humans is time-consuming for many graphics applications. Data-driven motion synthesis approaches have seen solid progress in recent years through deep generative models. These results offer high-quality motion... Read More about Denoising Diffusion Probabilistic Models for Styled Walking Synthesis.

UAV-ReID: A Benchmark on Unmanned Aerial Vehicle Re-Identification in Video Imagery (2022)
Presentation / Conference Contribution
Organisciak, D., Poyser, M., Alsehaim, A., Hu, S., Isaac-Medina, B. K., Breckon, T. P., & Shum, H. P. (2022). UAV-ReID: A Benchmark on Unmanned Aerial Vehicle Re-Identification in Video Imagery. . https://doi.org/10.5220/0010836600003124

As unmanned aerial vehicles (UAV) become more accessible with a growing range of applications, the risk of UAV disruption increases. Recent development in deep learning allows vision-based counter-UAV systems to detect and track UAVs with a single ca... Read More about UAV-ReID: A Benchmark on Unmanned Aerial Vehicle Re-Identification in Video Imagery.

A Skeleton-aware Graph Convolutional Network for Human-Object Interaction Detection (2022)
Presentation / Conference Contribution
Zhu, M., Ho, E. S., & Shum, H. P. (2022). A Skeleton-aware Graph Convolutional Network for Human-Object Interaction Detection. . https://doi.org/10.1109/smc53654.2022.9945149

Detecting human-object interactions is essential for comprehensive understanding of visual scenes. In particular, spatial connections between humans and objects are important cues for reasoning interactions. To this end, we propose a skeleton-aware g... Read More about A Skeleton-aware Graph Convolutional Network for Human-Object Interaction Detection.

Towards Graph Representation Learning Based Surgical Workflow Anticipation (2022)
Presentation / Conference Contribution
Zhang, X., Al Moubayed, N., & Shum, H. P. (2022). Towards Graph Representation Learning Based Surgical Workflow Anticipation. . https://doi.org/10.1109/bhi56158.2022.9926801

Surgical workflow anticipation can give predictions on what steps to conduct or what instruments to use next, which is an essential part of the computer-assisted intervention system for surgery, e.g. workflow reasoning in robotic surgery. However, cu... Read More about Towards Graph Representation Learning Based Surgical Workflow Anticipation.

A Feasibility Study on Image Inpainting for Non-cleft Lip Generation from Patients with Cleft Lip (2022)
Presentation / Conference Contribution
Chen, S., Atapour-Abarghouei, A., Kerby, J., Ho, E. S., Sainsbury, D. C., Butterworth, S., & Shum, H. P. (2022). A Feasibility Study on Image Inpainting for Non-cleft Lip Generation from Patients with Cleft Lip. . https://doi.org/10.1109/bhi56158.2022.9926917

A Cleft lip is a congenital abnormality requiring surgical repair by a specialist. The surgeon must have extensive experience and theoretical knowledge to perform surgery, and Artificial Intelligence (AI) method has been proposed to guide surgeons in... Read More about A Feasibility Study on Image Inpainting for Non-cleft Lip Generation from Patients with Cleft Lip.

Geometric Features Informed Multi-person Human-object Interaction Recognition in Videos (2022)
Presentation / Conference Contribution
Qiao, T., Men, Q., Li, F. W., Kubotani, Y., Morishima, S., & Shum, H. P. (2022). Geometric Features Informed Multi-person Human-object Interaction Recognition in Videos. . https://doi.org/10.1007/978-3-031-19772-7_28

Human-Object Interaction (HOI) recognition in videos is important for analysing human activity. Most existing work focusing on visual features usually suffer from occlusion in the real-world scenarios. Such a problem will be further complicated when... Read More about Geometric Features Informed Multi-person Human-object Interaction Recognition in Videos.

Multiclass-SGCN: Sparse Graph-based Trajectory Prediction with Agent Class Embedding (2022)
Presentation / Conference Contribution
Li, R., Katsigiannis, S., & Shum, H. P. (2022). Multiclass-SGCN: Sparse Graph-based Trajectory Prediction with Agent Class Embedding. In 2022 IEEE International Conference on Image Processing (ICIP) Proceedings (2346-2350). https://doi.org/10.1109/icip46576.2022.9897644

Trajectory prediction of road users in real-world scenarios is challenging because their movement patterns are stochastic and complex. Previous pedestrian-oriented works have been successful in modelling the complex interactions among pedestrians, bu... Read More about Multiclass-SGCN: Sparse Graph-based Trajectory Prediction with Agent Class Embedding.

Pose-based Tremor Classification for Parkinson’s Disease Diagnosis from Video (2022)
Presentation / Conference Contribution
Zhang, X., Zhang, H., & Shum, H. P. (2022). Pose-based Tremor Classification for Parkinson’s Disease Diagnosis from Video. . https://doi.org/10.1007/978-3-031-16440-8_47

Parkinson’s disease (PD) is a progressive neurodegenerative disorder that results in a variety of motor dysfunction symptoms, including tremors, bradykinesia, rigidity and postural instability. The diagnosis of PD mainly relies on clinical experience... Read More about Pose-based Tremor Classification for Parkinson’s Disease Diagnosis from Video.

MedNeRF: Medical Neural Radiance Fields for Reconstructing 3D-aware CT-Projections from a Single X-ray (2022)
Presentation / Conference Contribution
Corona-Figueroa, A., Frawley, J., Bond-Taylor, S., Bethapudi, S., Shum, H. P., & Willcocks, C. G. (2022). MedNeRF: Medical Neural Radiance Fields for Reconstructing 3D-aware CT-Projections from a Single X-ray. . https://doi.org/10.1109/embc48229.2022.9871757

Computed tomography (CT) is an effective med-ical imaging modality, widely used in the field of clinical medicine for the diagnosis of various pathologies. Advances in Multidetector CT imaging technology have enabled additional functionalities, inclu... Read More about MedNeRF: Medical Neural Radiance Fields for Reconstructing 3D-aware CT-Projections from a Single X-ray.

Cerebral Palsy Prediction with Frequency Attention Informed Graph Convolutional Networks (2022)
Presentation / Conference Contribution
Zhang, H., Shum, H. P., & Ho, E. S. (2022). Cerebral Palsy Prediction with Frequency Attention Informed Graph Convolutional Networks. . https://doi.org/10.1109/embc48229.2022.9871230

Early diagnosis and intervention are clinically considered the paramount part of treating cerebral palsy (CP), so it is essential to design an efficient and interpretable automatic prediction system for CP. We highlight a significant difference betwe... Read More about Cerebral Palsy Prediction with Frequency Attention Informed Graph Convolutional Networks.

360 Depth Estimation in the Wild - The Depth360 Dataset and the SegFuse Network (2022)
Presentation / Conference Contribution
Feng, Q., Shum, H. P., & Morishima, S. (2022). 360 Depth Estimation in the Wild - The Depth360 Dataset and the SegFuse Network. . https://doi.org/10.1109/vr51125.2022.00087

Single-view depth estimation from omnidirectional images has gained popularity with its wide range of applications such as autonomous driving and scene reconstruction. Although data-driven learning-based methods demonstrate significant potential in t... Read More about 360 Depth Estimation in the Wild - The Depth360 Dataset and the SegFuse Network.

Bi-projection-based Foreground-aware Omnidirectional Depth Prediction (2021)
Presentation / Conference Contribution
Feng, Q., Shum, H. P., & Morishima, S. (2021). Bi-projection-based Foreground-aware Omnidirectional Depth Prediction.

Due to the increasing availability of commercial 360- degree cameras, accurate depth prediction for omnidirectional images can be beneficial to a wide range of applications including video editing and augmented reality. Regarding existing methods, so... Read More about Bi-projection-based Foreground-aware Omnidirectional Depth Prediction.

Semantics-STGCNN: A Semantics-guided Spatial-Temporal Graph Convolutional Network for Multi-class Trajectory Prediction (2021)
Presentation / Conference Contribution
Rainbow, B. A., Men, Q., & Shum, H. P. (2021). Semantics-STGCNN: A Semantics-guided Spatial-Temporal Graph Convolutional Network for Multi-class Trajectory Prediction. . https://doi.org/10.1109/smc52423.2021.9658781

Predicting the movement trajectories of multiple classes of road users in real-world scenarios is a challenging task due to the diverse trajectory patterns. While recent works of pedestrian trajectory prediction successfully modelled the influence of... Read More about Semantics-STGCNN: A Semantics-guided Spatial-Temporal Graph Convolutional Network for Multi-class Trajectory Prediction.

DurLAR: A High-Fidelity 128-Channel LiDAR Dataset with Panoramic Ambient and Reflectivity Imagery for Multi-Modal Autonomous Driving Applications (2021)
Presentation / Conference Contribution
Li, L., Ismail, K. N., Shum, H. P., & Breckon, T. P. (2021). DurLAR: A High-Fidelity 128-Channel LiDAR Dataset with Panoramic Ambient and Reflectivity Imagery for Multi-Modal Autonomous Driving Applications. . https://doi.org/10.1109/3dv53792.2021.00130

We present DurLAR, a high-fidelity 128-channel 3D LiDAR dataset with panoramic ambient (near infrared) and reflectivity imagery, as well as a sample benchmark task using depth estimation for autonomous driving applications. Our driving platform is eq... Read More about DurLAR: A High-Fidelity 128-Channel LiDAR Dataset with Panoramic Ambient and Reflectivity Imagery for Multi-Modal Autonomous Driving Applications.

Unmanned Aerial Vehicle Visual Detection and Tracking using Deep Neural Networks: A Performance Benchmark (2021)
Presentation / Conference Contribution
Isaac-Medina, B. K., Poyser, M., Organisciak, D., Willcocks, C. G., Breckon, T. P., & Shum, H. P. (2021). Unmanned Aerial Vehicle Visual Detection and Tracking using Deep Neural Networks: A Performance Benchmark. . https://doi.org/10.1109/iccvw54120.2021.00142

Unmanned Aerial Vehicles (UAV) can pose a major risk for aviation safety, due to both negligent and malicious use. For this reason, the automated detection and tracking of UAV is a fundamental task in aerial security systems. Common technologies for... Read More about Unmanned Aerial Vehicle Visual Detection and Tracking using Deep Neural Networks: A Performance Benchmark.

STGAE: Spatial-Temporal Graph Auto-Encoder for Hand Motion Denoising (2021)
Presentation / Conference Contribution
Zhou, K., Cheng, Z., Shum, H. P., Li, F. W., & Liang, X. (2021). STGAE: Spatial-Temporal Graph Auto-Encoder for Hand Motion Denoising. . https://doi.org/10.1109/ismar52148.2021.00018

Hand object interaction in mixed reality (MR) relies on the accurate tracking and estimation of human hands, which provide users with a sense of immersion. However, raw captured hand motion data always contains errors such as joints occlusion, disloc... Read More about STGAE: Spatial-Temporal Graph Auto-Encoder for Hand Motion Denoising.

Human-centric Autonomous Driving in an AV-Pedestrian Interactive Environment Using SVO (2021)
Presentation / Conference Contribution
Crosato, L., Wei, C., Ho, E. S., & Shum, H. P. (2021). Human-centric Autonomous Driving in an AV-Pedestrian Interactive Environment Using SVO. . https://doi.org/10.1109/ichms53169.2021.9582640

As Autonomous Vehicles (AV) are becoming a reality, the design of efficient motion control algorithms will have to deal with the unpredictable and interactive nature of other road users. Current AV motion planning algorithms suffer from the freezing... Read More about Human-centric Autonomous Driving in an AV-Pedestrian Interactive Environment Using SVO.

Interpreting Deep Learning based Cerebral Palsy Prediction with Channel Attention (2021)
Presentation / Conference Contribution
Zhu, M., Men, Q., Ho, E. S., Leung, H., & Shum, H. P. (2021). Interpreting Deep Learning based Cerebral Palsy Prediction with Channel Attention. . https://doi.org/10.1109/bhi50953.2021.9508619

Early prediction of cerebral palsy is essential as it leads to early treatment and monitoring. Deep learning has shown promising results in biomedical engineering thanks to its capacity of modelling complicated data with its non-linear architecture.... Read More about Interpreting Deep Learning based Cerebral Palsy Prediction with Channel Attention.

Stable Hand Pose Estimation under Tremor via Graph Neural Network (2021)
Presentation / Conference Contribution
Leng, Z., Chen, J., Shum, H. P., Li, F. W., & Liang, X. (2021). Stable Hand Pose Estimation under Tremor via Graph Neural Network. In 2021 IEEE Virtual Reality and 3D User Interfaces (VR) (226-234). https://doi.org/10.1109/vr50410.2021.00044

Hand pose estimation, which predicts the spatial location of hand joints, is a fundamental task in VR/AR applications. Although existing methods can recover hand pose competently, the tremor issue occurring in hand motion has not been completely solv... Read More about Stable Hand Pose Estimation under Tremor via Graph Neural Network.

Makeup Style Transfer on Low-quality Images with Weighted Multi-scale Attention (2021)
Presentation / Conference Contribution
Organisciak, D., Ho, E. S., & Shum, H. P. (2021). Makeup Style Transfer on Low-quality Images with Weighted Multi-scale Attention. . https://doi.org/10.1109/icpr48806.2021.9412604

Facial makeup style transfer is an extremely challenging sub-field of image-to-image-translation. Due to this difficulty, state-of-the-art results are mostly reliant on the Face Parsing Algorithm, which segments a face into parts in order to easily e... Read More about Makeup Style Transfer on Low-quality Images with Weighted Multi-scale Attention.

A Two-Stream Recurrent Network for Skeleton-based Human Interaction Recognition (2021)
Presentation / Conference Contribution
Men, Q., Hoy, E. S., Shum, H. P., & Leung, H. (2021). A Two-Stream Recurrent Network for Skeleton-based Human Interaction Recognition. . https://doi.org/10.1109/icpr48806.2021.9412538

This paper addresses the problem of recognizing human-human interaction from skeletal sequences. Existing methods are mainly designed to classify single human action. Many of them simply stack the movement features of two characters to deal with huma... Read More about A Two-Stream Recurrent Network for Skeleton-based Human Interaction Recognition.