Department of Computer Science

Towards Scalable Spatial Intelligence via 2D-to-3D Data Lifting (2025)
Presentation / Conference Contribution
Miao, X., Duan, H., Qian, Q., Wang, J., Long, Y., Shao, L., Zhao, D., Xu, R., & Zhang, G. (2025, October). Towards Scalable Spatial Intelligence via 2D-to-3D Data Lifting. Presented at International Conference on Computer Vision, ICCV 2025, Honolulu, Hawaii

Rethinking Score Distilling Sampling for 3D Editing and Generation (2025)
Presentation / Conference Contribution
Miao, X., Duan, H., Long, Y., & Han, J. (2025, July). Rethinking Score Distilling Sampling for 3D Editing and Generation. Presented at ICML 2025: International Conference on Machine Learning, Vancouver

Score Distillation Sampling (SDS) has emerged as a prominent method for text-to-3D generation by leveraging the strengths of 2D diffusion models. However, SDS is limited to generation tasks and lacks the capability to edit existing 3D assets. Convers... Read More about Rethinking Score Distilling Sampling for 3D Editing and Generation.

Laser: Efficient Language-Guided Segmentation in Neural Radiance Fields (2025)
Journal Article
Miao, X., Duan, H., Bai, Y., Shah, T., Song, J., Long, Y., Ranjan, R., & Shao, L. (2025). Laser: Efficient Language-Guided Segmentation in Neural Radiance Fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(5), 3922-3934. https://doi.org/10.1109/TPAMI.2025.3535916

In this work, we propose a method that leverages CLIP feature distillation, achieving efficient 3D segmentation through language guidance. Unlike previous methods that rely on multi-scale CLIP features and are limited by processing speed and storage... Read More about Laser: Efficient Language-Guided Segmentation in Neural Radiance Fields.

CTNeRF: Cross-time Transformer for dynamic neural radiance field from monocular video (2024)
Journal Article
Miao, X., Bai, Y., Duan, H., Wan, F., Huang, Y., Long, Y., & Zheng, Y. (2024). CTNeRF: Cross-time Transformer for dynamic neural radiance field from monocular video. Pattern Recognition, 156, Article 110729. https://doi.org/10.1016/j.patcog.2024.110729

The goal of our work is to generate high-quality novel views from monocular videos of complex and dynamic scenes. Prior methods, such as DynamicNeRF, have shown impressive performance by leveraging time-varying dynamic radiation fields. However, thes... Read More about CTNeRF: Cross-time Transformer for dynamic neural radiance field from monocular video.

DS-Depth: Dynamic and Static Depth Estimation via a Fusion Cost Volume (2023)
Journal Article
Miao, X., Bai, Y., Duan, H., Huang, Y., Wan, F., Xu, X., Long, Y., & Zheng, Y. (2024). DS-Depth: Dynamic and Static Depth Estimation via a Fusion Cost Volume. IEEE Transactions on Circuits and Systems for Video Technology, 34(4), 2564-2576. https://doi.org/10.1109/tcsvt.2023.3305776

Self-supervised monocular depth estimation methods typically rely on the reprojection error to capture geometric relationships between successive frames in static environments. However, this assumption does not hold in dynamic objects in scenarios, l... Read More about DS-Depth: Dynamic and Static Depth Estimation via a Fusion Cost Volume.

Outputs (5)