Skip to main content

Research Repository

Advanced Search

Outputs (9)

CLPFusion: A Latent Diffusion Model Framework for Realistic Chinese Landscape Painting Style Transfer (2025)
Journal Article
Pan, J., Li, F. W. B., Yang, B., & Nan, F. (2025). CLPFusion: A Latent Diffusion Model Framework for Realistic Chinese Landscape Painting Style Transfer. Computer Animation and Virtual Worlds, 36(3), Article e70053. https://doi.org/10.1002/cav.70053

This study focuses on transforming real-world scenery into Chinese landscape painting masterpieces through style transfer. Traditional methods using convolutional neural networks (CNNs) and generative adversarial networks (GANs) often yield inconsist... Read More about CLPFusion: A Latent Diffusion Model Framework for Realistic Chinese Landscape Painting Style Transfer.

PHI: Bridging Domain Shift in Long-Term Action Quality Assessment via Progressive Hierarchical Instruction (2025)
Journal Article
Zhou, K., Shum, H. P. H., Li, F. W. B., Zhang, X., & Liang, X. (2025). PHI: Bridging Domain Shift in Long-Term Action Quality Assessment via Progressive Hierarchical Instruction. IEEE Transactions on Image Processing, https://doi.org/10.1109/TIP.2025.3574938

Long-term Action Quality Assessment (AQA) aims to evaluate the quantitative performance of actions in long videos. However, existing methods face challenges due to domain shifts between the pre-trained large-scale action recognition backbones and the... Read More about PHI: Bridging Domain Shift in Long-Term Action Quality Assessment via Progressive Hierarchical Instruction.

Geometric Visual Fusion Graph Neural Networks for Multi-Person Human-Object Interaction Recognition in Videos (2025)
Journal Article
Qiao, T., Li, R., Li, F. W. B., Kubotani, Y., Morishima, S., & Shum, H. P. H. (2025). Geometric Visual Fusion Graph Neural Networks for Multi-Person Human-Object Interaction Recognition in Videos. Expert Systems with Applications, 290, Article 128344. https://doi.org/10.1016/j.eswa.2025.128344

Human-Object Interaction (HOI) recognition in videos requires understanding both visual patterns and geometric relationships as they evolve over time. Visual and geometric features offer complementary strengths. Visual features capture appearance con... Read More about Geometric Visual Fusion Graph Neural Networks for Multi-Person Human-Object Interaction Recognition in Videos.

Talking Face Generation with Lip and Identity Priors (2025)
Journal Article
Wu, J., Li, F. W. B., Tam, G. K. L., Yang, B., Nan, F., & Pan, J. (2025). Talking Face Generation with Lip and Identity Priors. Computer Animation and Virtual Worlds, 36(3), Article e70026. https://doi.org/10.1002/cav.70026

Speech-driven talking face video generation has attracted growing interest in recent research. While person-specific approaches yield high-fidelity results, they require extensive training data from each individual speaker. In contrast, general-purpo... Read More about Talking Face Generation with Lip and Identity Priors.

WDFSR: Normalizing Flow based on Wavelet-Domain for Super-Resolution (2025)
Journal Article
Song, C., Li, S., Li, F. W. B., & Yang, B. (2025). WDFSR: Normalizing Flow based on Wavelet-Domain for Super-Resolution. Computational Visual Media, 11(2), 381-404

We propose a Normalizing flow based on the wavelet framework for super-resolution called WDFSR. It learns the conditional distribution mapping between low-resolution images in the RGB domain and high-resolution images in the wavelet domain to generat... Read More about WDFSR: Normalizing Flow based on Wavelet-Domain for Super-Resolution.

Multi-modal Dynamic Point Cloud Geometric Compression Based on Bidirectional Recurrent Scene Flow* (2025)
Presentation / Conference Contribution
Nan, F., Li, F., Wang, Z., Tam, G. K. L., Jiang, Z., DongZheng, D., & Yang, B. (2025, April). Multi-modal Dynamic Point Cloud Geometric Compression Based on Bidirectional Recurrent Scene Flow*. Presented at ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Hyderabad, India

Deep learning methods have recently shown significant promise in compressing the geometric features of point clouds. However, challenges arise when consecutive point clouds contain holes, resulting in incomplete information that complicates motion es... Read More about Multi-modal Dynamic Point Cloud Geometric Compression Based on Bidirectional Recurrent Scene Flow*.

Fg-T2M++: LLMs-Augmented Fine-Grained Text Driven Human Motion Generation (2025)
Journal Article
Wang, Y., Li, M., Liu, J., Leng, Z., Li, F. W. B., Zhang, Z., & Liang, X. (2025). Fg-T2M++: LLMs-Augmented Fine-Grained Text Driven Human Motion Generation. International Journal of Computer Vision, 133, 4277-4293. https://doi.org/10.1007/s11263-025-02392-9

We address the challenging problem of fine-grained text-driven human motion generation. Existing works generate imprecise motions that fail to accurately capture relationships specified in text due to: (1) lack of effective text parsing for detailed... Read More about Fg-T2M++: LLMs-Augmented Fine-Grained Text Driven Human Motion Generation.

3D data augmentation and dual-branch model for robust face forgery detection (2025)
Journal Article
Zhou, C., Li, F. W., Song, C., Zheng, D., & Yang, B. (2025). 3D data augmentation and dual-branch model for robust face forgery detection. Graphical Models, 138, Article 101255. https://doi.org/10.1016/j.gmod.2025.101255

We propose Dual-Branch Network (DBNet), a novel deepfake detection framework that addresses key limitations of existing works by jointly modeling 3D-temporal and fine-grained texture representations. Specifically, we aim to investigate how to (1) cap... Read More about 3D data augmentation and dual-branch model for robust face forgery detection.