Matteo Toso
Maps from Motion (MfM): Generating 2D Semantic Maps from Sparse Multi-view Images
Toso, Matteo; Fiorini, Stefano; James, Stuart; Del Bue, Alessio
Authors
Abstract
World-wide detailed 2D maps require enormous collective efforts. OpenStreetMap is the result of 11 million registered users manually annotating the GPS location of over 1.75 billion entries, including distinctive landmarks and common urban objects. At the same time, manual annotations can include errors and are slow to update, limiting the map's accuracy. Maps from Motion (MfM) is a step forward to automatize such time-consuming map making procedure by computing 2D maps of semantic objects directly from a collection of uncalibrated multi-view images.
From each image, we extract a set of object detections, and estimate their spatial arrangement in a top-down local map centered in the reference frame of the camera that captured the image. Aligning these local maps is not a trivial problem, since they provide incomplete, noisy fragments of the scene, and matching detections across them is unreliable because of the presence of repeated pattern and the limited appearance variability of urban objects. We address this with a novel graph-based framework, that encodes the spatial and semantic distribution of the objects detected in each image, and learns how to combine them to predict the objects' poses in a global reference system, while taking into account all possible detection matches and preserving the topology observed in each image. Despite the complexity of the problem, our best model achieves global 2D registration with an average accuracy within 4 meters ( i.e., below GPS accuracy) even on sparse sequences with strong viewpoint change, on which COLMAP has an 80% failure rate. We provide extensive evaluation on synthetic and real-world data, showing how the method obtains a solution even in scenarios where standard optimization techniques fail.
Citation
Toso, M., Fiorini, S., James, S., & Del Bue, A. (2025, March). Maps from Motion (MfM): Generating 2D Semantic Maps from Sparse Multi-view Images. Presented at International Conference on 3D Vision (3DV), Singapore
Presentation Conference Type | Conference Paper (published) |
---|---|
Conference Name | International Conference on 3D Vision (3DV) |
Start Date | Mar 25, 2025 |
End Date | Mar 28, 2025 |
Acceptance Date | Nov 6, 2024 |
Deposit Date | Nov 25, 2024 |
Publicly Available Date | Nov 26, 2024 |
Journal | arXiv |
Peer Reviewed | Peer Reviewed |
Public URL | https://durham-repository.worktribe.com/output/3105498 |
Files
Published Journal Article
(1.8 Mb)
PDF
Publisher Licence URL
http://creativecommons.org/licenses/by-nc-nd/4.0/
You might also like
Positional diffusion: Graph-based diffusion models for set ordering
(2024)
Journal Article
Re-assembling the past: The RePAIR dataset and benchmark for real world 2D and 3D puzzle solving
(2024)
Presentation / Conference Contribution
PRAGO: Differentiable Multi-View Pose Optimization From Objectness Detections*
(2024)
Presentation / Conference Contribution
IFFNeRF: Initialisation Free and Fast 6DoF pose estimation from a single image and a NeRF model
(2024)
Presentation / Conference Contribution
Downloadable Citations
About Durham Research Online (DRO)
Administrator e-mail: dro.admin@durham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search