Tom Joy
Learning Multimodal VAEs Through Mutual Supervision
Joy, Tom; Shi, Yuge; Torr, Philip H.S.; Rainforth, Tom; Schmon, Sebastian M.; Siddharth, N.
Authors
Yuge Shi
Philip H.S. Torr
Tom Rainforth
Sebastian M. Schmon
N. Siddharth
Abstract
Multimodal VAEs seek to model the joint distribution over heterogeneous data (e.g.\ vision, language), whilst also capturing a shared representation across such modalities. Prior work has typically combined information from the modalities by reconciling idiosyncratic representations directly in the recognition model through explicit products, mixtures, or other such factorisations. Here we introduce a novel alternative, the MEME, that avoids such explicit combinations by repurposing semi-supervised VAEs to combine information between modalities implicitly through mutual supervision. This formulation naturally allows learning from partially-observed data where some modalities can be entirely missing---something that most existing approaches either cannot handle, or do so to a limited extent. We demonstrate that MEME outperforms baselines on standard metrics across both partial and complete observation schemes on the MNIST-SVHN (image--image) and CUB (image--text) datasets. We also contrast the quality of the representations learnt by mutual supervision against standard approaches and observe interesting trends in its ability to capture relatedness between data.
Citation
Joy, T., Shi, Y., Torr, P. H., Rainforth, T., Schmon, S. M., & Siddharth, N. (2022). Learning Multimodal VAEs Through Mutual Supervision.
Conference Name | ICLR 2022: The Tenth International Conference on Learning Representations |
---|---|
Conference Location | Virtual |
Start Date | Apr 25, 2022 |
End Date | Apr 29, 2022 |
Acceptance Date | Jan 20, 2022 |
Online Publication Date | Sep 29, 2021 |
Publication Date | 2022 |
Deposit Date | Jun 24, 2022 |
Publicly Available Date | Jun 24, 2022 |
Publisher URL | https://openreview.net/forum?id=1xXvPrAshao |
Files
Accepted Conference Proceeding
(10.2 Mb)
PDF
You might also like
Denoising Diffusion Probabilistic Models on SO(3) for Rotational Alignment
(2022)
Conference Proceeding
AnoDDPM: Anomaly Detection With Denoising Diffusion Probabilistic Models Using Simplex Noise
(2022)
Conference Proceeding
Amortised Likelihood-free Inference for Expensive Time-series Simulators with Signatured Ratio Estimation
(2022)
Conference Proceeding
Optimal scaling of random-walk Metropolis algorithms using Bayesian large-sample asymptotics
(2022)
Journal Article