Learning Multimodal VAEs Through Mutual Supervision

Joy, Tom; Shi, Yuge; Torr, Philip H.S.; Rainforth, Tom; Schmon, Sebastian M.; Siddharth, N.

Learning Multimodal VAEs Through Mutual Supervision

Joy, Tom; Shi, Yuge; Torr, Philip H.S.; Rainforth, Tom; Schmon, Sebastian M.; Siddharth, N.

Authors

Tom Joy

Yuge Shi

Philip H.S. Torr

Tom Rainforth

Sebastian M. Schmon

N. Siddharth

Abstract

Multimodal VAEs seek to model the joint distribution over heterogeneous data (e.g.\ vision, language), whilst also capturing a shared representation across such modalities. Prior work has typically combined information from the modalities by reconciling idiosyncratic representations directly in the recognition model through explicit products, mixtures, or other such factorisations. Here we introduce a novel alternative, the MEME, that avoids such explicit combinations by repurposing semi-supervised VAEs to combine information between modalities implicitly through mutual supervision. This formulation naturally allows learning from partially-observed data where some modalities can be entirely missing---something that most existing approaches either cannot handle, or do so to a limited extent. We demonstrate that MEME outperforms baselines on standard metrics across both partial and complete observation schemes on the MNIST-SVHN (image--image) and CUB (image--text) datasets. We also contrast the quality of the representations learnt by mutual supervision against standard approaches and observe interesting trends in its ability to capture relatedness between data.

Citation

Joy, T., Shi, Y., Torr, P. H., Rainforth, T., Schmon, S. M., & Siddharth, N. (2022, April). Learning Multimodal VAEs Through Mutual Supervision. Presented at ICLR 2022: The Tenth International Conference on Learning Representations, Virtual

Presentation Conference Type	Conference Paper (published)
Conference Name	ICLR 2022: The Tenth International Conference on Learning Representations
Start Date	Apr 25, 2022
End Date	Apr 29, 2022
Acceptance Date	Jan 20, 2022
Online Publication Date	Sep 29, 2021
Publication Date	2022
Deposit Date	Jun 24, 2022
Publicly Available Date	Jun 24, 2022
Public URL	https://durham-repository.worktribe.com/output/1136777
Publisher URL	https://openreview.net/forum?id=1xXvPrAshao