Skip to main content

Research Repository

Advanced Search

Outputs (663)

Is Unimodal Bias Always Bad for Visual Question Answering? A Medical Domain Study with Dynamic Attention (2022)
Presentation / Conference Contribution
Sun, Z., Harit, A., Cristea, A. I., Yu, J., Al Moubayed, N., & Shi, L. (2022). Is Unimodal Bias Always Bad for Visual Question Answering? A Medical Domain Study with Dynamic Attention. . https://doi.org/10.1109/bigdata55660.2022.10020791

Medical visual question answering (Med-VQA) is to answer medical questions based on clinical images provided. This field is still in its infancy due to the complexity of the trio formed of questions, multimodal features and expert knowledge. In this... Read More about Is Unimodal Bias Always Bad for Visual Question Answering? A Medical Domain Study with Dynamic Attention.

Denoising Diffusion Probabilistic Models for Styled Walking Synthesis (2022)
Presentation / Conference Contribution
Findlay, E., Zhang, H., Chang, Z., & Shum, H. P. (2022). Denoising Diffusion Probabilistic Models for Styled Walking Synthesis. . https://doi.org/10.1145/3561975

Generating realistic motions for digital humans is time-consuming for many graphics applications. Data-driven motion synthesis approaches have seen solid progress in recent years through deep generative models. These results offer high-quality motion... Read More about Denoising Diffusion Probabilistic Models for Styled Walking Synthesis.

Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes (2022)
Presentation / Conference Contribution
Bond-Taylor, S., Hessey, P., Sasaki, H., Breckon, T., & Willcocks, C. (2022). Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes. In ECCV 2022: Computer Vision – ECCV 2022 (170-188)

Whilst diffusion probabilistic models can generate high quality image content, key limitations remain in terms of both generating high-resolution imagery and their associated high computational requirements. Recent Vector-Quantized image models have... Read More about Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes.

STIT: Spatio-Temporal Interaction Transformers for Human-Object Interaction Recognition in Videos (2022)
Presentation / Conference Contribution
Almushyti, M., & Li, F. W. (2022). STIT: Spatio-Temporal Interaction Transformers for Human-Object Interaction Recognition in Videos. . https://doi.org/10.1109/icpr56361.2022.9956030

Recognizing human-object interactions is challenging due to their spatio-temporal changes. We propose the SpatioTemporal Interaction Transformer-based (STIT) network to reason such changes. Specifically, spatial transformers learn humans and objects... Read More about STIT: Spatio-Temporal Interaction Transformers for Human-Object Interaction Recognition in Videos.

Towards Graph Representation Learning Based Surgical Workflow Anticipation (2022)
Presentation / Conference Contribution
Zhang, X., Al Moubayed, N., & Shum, H. P. (2022). Towards Graph Representation Learning Based Surgical Workflow Anticipation. . https://doi.org/10.1109/bhi56158.2022.9926801

Surgical workflow anticipation can give predictions on what steps to conduct or what instruments to use next, which is an essential part of the computer-assisted intervention system for surgery, e.g. workflow reasoning in robotic surgery. However, cu... Read More about Towards Graph Representation Learning Based Surgical Workflow Anticipation.

A Feasibility Study on Image Inpainting for Non-cleft Lip Generation from Patients with Cleft Lip (2022)
Presentation / Conference Contribution
Chen, S., Atapour-Abarghouei, A., Kerby, J., Ho, E. S., Sainsbury, D. C., Butterworth, S., & Shum, H. P. (2022). A Feasibility Study on Image Inpainting for Non-cleft Lip Generation from Patients with Cleft Lip. . https://doi.org/10.1109/bhi56158.2022.9926917

A Cleft lip is a congenital abnormality requiring surgical repair by a specialist. The surgeon must have extensive experience and theoretical knowledge to perform surgery, and Artificial Intelligence (AI) method has been proposed to guide surgeons in... Read More about A Feasibility Study on Image Inpainting for Non-cleft Lip Generation from Patients with Cleft Lip.

Detecting Melanoma Fairly: Skin Tone Detection and Debiasing for Skin Lesion Classification (2022)
Presentation / Conference Contribution
Bevan, P. J., & Atapour-Abarghouei, A. (2022). Detecting Melanoma Fairly: Skin Tone Detection and Debiasing for Skin Lesion Classification. In K. Kamnitsas, L. Koch, M. Islam, Z. Xu, J. Cardoso, Q. Doi, …S. Tsaftaris (Eds.), DART 2022: Domain Adaptation and Representation Transfer (1-11). https://doi.org/10.1007/978-3-031-16852-9_1

Convolutional Neural Networks have demonstrated human-level performance in the classification of melanoma and other skin lesions, but evident performance disparities between differing skin tones should be addressed before widespread deployment. In th... Read More about Detecting Melanoma Fairly: Skin Tone Detection and Debiasing for Skin Lesion Classification.

A Skeleton-aware Graph Convolutional Network for Human-Object Interaction Detection (2022)
Presentation / Conference Contribution
Zhu, M., Ho, E. S., & Shum, H. P. (2022). A Skeleton-aware Graph Convolutional Network for Human-Object Interaction Detection. . https://doi.org/10.1109/smc53654.2022.9945149

Detecting human-object interactions is essential for comprehensive understanding of visual scenes. In particular, spatial connections between humans and objects are important cues for reasoning interactions. To this end, we propose a skeleton-aware g... Read More about A Skeleton-aware Graph Convolutional Network for Human-Object Interaction Detection.