Skip to main content

Research Repository

Advanced Search

Dr Frederick Li's Outputs (82)

CLPFusion: A Latent Diffusion Model Framework for Realistic Chinese Landscape Painting Style Transfer (2025)
Journal Article
Pan, J., Li, F. W. B., Yang, B., & Nan, F. (2025). CLPFusion: A Latent Diffusion Model Framework for Realistic Chinese Landscape Painting Style Transfer. Computer Animation and Virtual Worlds, 36(3), Article e70053. https://doi.org/10.1002/cav.70053

This study focuses on transforming real-world scenery into Chinese landscape painting masterpieces through style transfer. Traditional methods using convolutional neural networks (CNNs) and generative adversarial networks (GANs) often yield inconsist... Read More about CLPFusion: A Latent Diffusion Model Framework for Realistic Chinese Landscape Painting Style Transfer.

PHI: Bridging Domain Shift in Long-Term Action Quality Assessment via Progressive Hierarchical Instruction (2025)
Journal Article
Zhou, K., Shum, H. P. H., Li, F. W. B., Zhang, X., & Liang, X. (2025). PHI: Bridging Domain Shift in Long-Term Action Quality Assessment via Progressive Hierarchical Instruction. IEEE Transactions on Image Processing, https://doi.org/10.1109/TIP.2025.3574938

Long-term Action Quality Assessment (AQA) aims to evaluate the quantitative performance of actions in long videos. However, existing methods face challenges due to domain shifts between the pre-trained large-scale action recognition backbones and the... Read More about PHI: Bridging Domain Shift in Long-Term Action Quality Assessment via Progressive Hierarchical Instruction.

Geometric Visual Fusion Graph Neural Networks for Multi-Person Human-Object Interaction Recognition in Videos (2025)
Journal Article
Qiao, T., Li, R., Li, F. W. B., Kubotani, Y., Morishima, S., & Shum, H. P. H. (2025). Geometric Visual Fusion Graph Neural Networks for Multi-Person Human-Object Interaction Recognition in Videos. Expert Systems with Applications, 290, Article 128344. https://doi.org/10.1016/j.eswa.2025.128344

Human-Object Interaction (HOI) recognition in videos requires understanding both visual patterns and geometric relationships as they evolve over time. Visual and geometric features offer complementary strengths. Visual features capture appearance con... Read More about Geometric Visual Fusion Graph Neural Networks for Multi-Person Human-Object Interaction Recognition in Videos.

Talking Face Generation with Lip and Identity Priors (2025)
Journal Article
Wu, J., Li, F. W. B., Tam, G. K. L., Yang, B., Nan, F., & Pan, J. (2025). Talking Face Generation with Lip and Identity Priors. Computer Animation and Virtual Worlds, 36(3), Article e70026. https://doi.org/10.1002/cav.70026

Speech-driven talking face video generation has attracted growing interest in recent research. While person-specific approaches yield high-fidelity results, they require extensive training data from each individual speaker. In contrast, general-purpo... Read More about Talking Face Generation with Lip and Identity Priors.

WDFSR: Normalizing Flow based on Wavelet-Domain for Super-Resolution (2025)
Journal Article
Song, C., Li, S., Li, F. W. B., & Yang, B. (2025). WDFSR: Normalizing Flow based on Wavelet-Domain for Super-Resolution. Computational Visual Media, 11(2), 381-404

We propose a Normalizing flow based on the wavelet framework for super-resolution called WDFSR. It learns the conditional distribution mapping between low-resolution images in the RGB domain and high-resolution images in the wavelet domain to generat... Read More about WDFSR: Normalizing Flow based on Wavelet-Domain for Super-Resolution.

Multi-modal Dynamic Point Cloud Geometric Compression Based on Bidirectional Recurrent Scene Flow* (2025)
Presentation / Conference Contribution
Nan, F., Li, F., Wang, Z., Tam, G. K. L., Jiang, Z., DongZheng, D., & Yang, B. (2025, April). Multi-modal Dynamic Point Cloud Geometric Compression Based on Bidirectional Recurrent Scene Flow*. Presented at ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Hyderabad, India

Deep learning methods have recently shown significant promise in compressing the geometric features of point clouds. However, challenges arise when consecutive point clouds contain holes, resulting in incomplete information that complicates motion es... Read More about Multi-modal Dynamic Point Cloud Geometric Compression Based on Bidirectional Recurrent Scene Flow*.

Fg-T2M++: LLMs-Augmented Fine-Grained Text Driven Human Motion Generation (2025)
Journal Article
Wang, Y., Li, M., Liu, J., Leng, Z., Li, F. W. B., Zhang, Z., & Liang, X. (online). Fg-T2M++: LLMs-Augmented Fine-Grained Text Driven Human Motion Generation. International Journal of Computer Vision, https://doi.org/10.1007/s11263-025-02392-9

We address the challenging problem of fine-grained text-driven human motion generation. Existing works generate imprecise motions that fail to accurately capture relationships specified in text due to: (1) lack of effective text parsing for detailed... Read More about Fg-T2M++: LLMs-Augmented Fine-Grained Text Driven Human Motion Generation.

3D data augmentation and dual-branch model for robust face forgery detection (2025)
Journal Article
Zhou, C., Li, F. W., Song, C., Zheng, D., & Yang, B. (2025). 3D data augmentation and dual-branch model for robust face forgery detection. Graphical Models, 138, Article 101255. https://doi.org/10.1016/j.gmod.2025.101255

We propose Dual-Branch Network (DBNet), a novel deepfake detection framework that addresses key limitations of existing works by jointly modeling 3D-temporal and fine-grained texture representations. Specifically, we aim to investigate how to (1) cap... Read More about 3D data augmentation and dual-branch model for robust face forgery detection.

From Category to Scenery: An End-to-End Framework for Multi-Person Human-Object Interaction Recognition in Videos (2024)
Presentation / Conference Contribution
Qiao, T., Li, R., Li, F. W. B., & Shum, H. P. H. (2024, December). From Category to Scenery: An End-to-End Framework for Multi-Person Human-Object Interaction Recognition in Videos. Presented at ICPR 2024: International Conference on Pattern Recognition, Kolkata, India

Video-based Human-Object Interaction (HOI) recognition explores the intricate dynamics between humans and objects, which are essential for a comprehensive understanding of human behavior and intentions. While previous work has made significant stride... Read More about From Category to Scenery: An End-to-End Framework for Multi-Person Human-Object Interaction Recognition in Videos.

MAGR: Manifold-Aligned Graph Regularization for Continual Action Quality Assessment (2024)
Presentation / Conference Contribution
Zhou, K., Wang, L., Zhang, X., Shum, H. P., Li, F. W. B., Li, J., & Liang, X. (2024, September). MAGR: Manifold-Aligned Graph Regularization for Continual Action Quality Assessment. Presented at Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Milan, Italy

Action Quality Assessment (AQA) evaluates diverse skills but models struggle with non-stationary data. We propose Continual AQA (CAQA) to refine models using sparse new data. Feature replay preserves memory without storing raw inputs. However, the mi... Read More about MAGR: Manifold-Aligned Graph Regularization for Continual Action Quality Assessment.

Reducing University Students’ Exam Anxiety via Mindfulness-Based Cognitive Therapy in VR with Real-Time EEG Neurofeedback (2024)
Presentation / Conference Contribution
Pan, Z., Cristea, A. I., & Li, F. W. B. (2024, July). Reducing University Students’ Exam Anxiety via Mindfulness-Based Cognitive Therapy in VR with Real-Time EEG Neurofeedback. Presented at AIED 2024: Artificial Intelligence in Education, Recife, Brazil

This research aims to develop and evaluate a novel approach to reduce university students’ exam anxiety and teach them how to better manage it using a personalised, emotion-informed Mindfulness-Based Cognitive Therapy (MBCT) method, delivered within... Read More about Reducing University Students’ Exam Anxiety via Mindfulness-Based Cognitive Therapy in VR with Real-Time EEG Neurofeedback.

Multi-Feature Fusion Enhanced Monocular Depth Estimation With Boundary Awareness (2024)
Journal Article
Song, C., Chen, Q., Li, F. W. B., Jiang, Z., Zheng, D., Shen, Y., & Yang, B. (2024). Multi-Feature Fusion Enhanced Monocular Depth Estimation With Boundary Awareness. Visual Computer, 40, 4955–4967. https://doi.org/10.1007/s00371-024-03498-w

Self-supervised monocular depth estimation has opened up exciting possibilities for practical applications, including scene understanding, object detection, and autonomous driving, without the need for expensive depth annotations. However, traditiona... Read More about Multi-Feature Fusion Enhanced Monocular Depth Estimation With Boundary Awareness.

Color Theme Evaluation through User Preference Modeling (2024)
Journal Article
Yang, B., Wei, T., Li, F. W. B., Liang, X., Deng, Z., & Fang, Y. (2024). Color Theme Evaluation through User Preference Modeling. ACM Transactions on Applied Perception, 21(3), 1-35. https://doi.org/10.1145/3665329

Color composition (or color theme) is a key factor to determine how well a piece of art work or graphical design is perceived by humans. Despite a few color harmony models have been proposed, their results are often less satisfactory since they mostl... Read More about Color Theme Evaluation through User Preference Modeling.

Multi-Style Cartoonization: Leveraging Multiple Datasets With GANs (2024)
Journal Article
Cai, J., Li, F. W. B., Nan, F., & Yang, B. (2024). Multi-Style Cartoonization: Leveraging Multiple Datasets With GANs. Computer Animation and Virtual Worlds, 35(3), Article e2269. https://doi.org/10.1002/cav.2269

Scene cartoonization aims to convert photos into stylized cartoons. While GANs can generate high-quality images, previous methods focus on individual images or single styles, ignoring relationships between datasets. We propose a novel multi-style sce... Read More about Multi-Style Cartoonization: Leveraging Multiple Datasets With GANs.

Laplacian Projection Based Global Physical Prior Smoke Reconstruction (2024)
Journal Article
Xiao, S., Tong, C., Zhang, Q., Cen, Y., Li, F. W. B., & Liang, X. (2024). Laplacian Projection Based Global Physical Prior Smoke Reconstruction. IEEE Transactions on Visualization and Computer Graphics, 30(12), 7657-7671. https://doi.org/10.1109/tvcg.2024.3358636

We present a novel framework for reconstructing fluid dynamics in real-life scenarios. Our approach leverages sparse view images and incorporates physical priors across long series of frames, resulting in reconstructed fluids with enhanced physical c... Read More about Laplacian Projection Based Global Physical Prior Smoke Reconstruction.

HSE: Hybrid Species Embedding for Deep Metric Learning (2024)
Presentation / Conference Contribution
Yang, B., Sun, H., Li, F. W. B., Chen, Z., Cai, J., & Song, C. (2023, October). HSE: Hybrid Species Embedding for Deep Metric Learning. Presented at 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris

Deep metric learning is crucial for finding an embedding function that can generalize to training and testing data, including unknown test classes. However, limited training samples restrict the model's generalization to downstream tasks. While addin... Read More about HSE: Hybrid Species Embedding for Deep Metric Learning.

Tackling Data Bias in Painting Classification with Style Transfer (2023)
Presentation / Conference Contribution
Vijendran, M., Li, F. W., & Shum, H. P. (2023, February). Tackling Data Bias in Painting Classification with Style Transfer. Presented at VISAPP '23: 2023 International Conference on Computer Vision Theory and Applications, Lisbon, Portugal

It is difficult to train classifiers on paintings collections due to model bias from domain gaps and data bias from the uneven distribution of artistic styles. Previous techniques like data distillation, traditional data augmentation and style transf... Read More about Tackling Data Bias in Painting Classification with Style Transfer.

Fg-T2M: Fine-Grained Text-Driven Human Motion Generation via Diffusion Model (2023)
Presentation / Conference Contribution
Wang, Y., Leng, Z., Li, F. W. B., Wu, S.-C., & Liang, X. (2023, October). Fg-T2M: Fine-Grained Text-Driven Human Motion Generation via Diffusion Model. Presented at 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris

Text-driven human motion generation in computer vision is both significant and challenging. However, current methods are limited to producing either deterministic or imprecise motion sequences, failing to effectively control the temporal and spatial... Read More about Fg-T2M: Fine-Grained Text-Driven Human Motion Generation via Diffusion Model.