J.J. Kurcius
Using Compressed Audio-visual Words for Multi-modal Scene Classification
Kurcius, J.J.; Breckon, T.P.
Abstract
We present a novel approach to scene classification using combined audio signal and video image features and compare this methodology to scene classification results using each modality in isolation. Each modality is represented using summary features, namely Mel-frequency Cepstral Coefficients (audio) and Scale Invariant Feature Transform (SIFT) (video) within a multi-resolution bag-of-features model. Uniquely, we extend the classical bag-of-words approach over both audio and video feature spaces, whereby we introduce the concept of compressive sensing as a novel methodology for multi-modal fusion via audio-visual feature dimensionality reduction. We perform evaluation over a range of environments showing performance that is both comparable to the state of the art (86%, over ten scene classes) and invariant to a ten-fold dimensionality reduction within the audio-visual feature space using our compressive representation approach.
Citation
Kurcius, J., & Breckon, T. (2014, November). Using Compressed Audio-visual Words for Multi-modal Scene Classification. Presented at Proc. International Workshop on Computational Intelligence for Multimedia Understanding
Presentation Conference Type | Conference Paper (published) |
---|---|
Conference Name | Proc. International Workshop on Computational Intelligence for Multimedia Understanding |
Publication Date | 2014 |
Deposit Date | Dec 9, 2014 |
Publicly Available Date | Feb 4, 2015 |
Publisher | Institute of Electrical and Electronics Engineers |
Pages | 1-5 |
Book Title | Proc. International Workshop on Computational Intelligence for Multimedia Understanding |
DOI | https://doi.org/10.1109/IWCIM.2014.7008808 |
Keywords | multi-resolution, bag of words, MFCC, compressed sensing, audio-visual, multi-modal, random projection matrix |
Public URL | https://durham-repository.worktribe.com/output/1153679 |
Publisher URL | https://breckon.org/toby/publications/papers/kurcius14audiovisual.pdf |
Files
Accepted Conference Proceeding
(411 Kb)
PDF
Copyright Statement
© 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
You might also like
Progressively Select and Reject Pseudo-labelled Samples for Open-Set Domain Adaptation
(2024)
Journal Article
Generalized Zero-Shot Domain Adaptation via Coupled Conditional Variational Autoencoders
(2023)
Journal Article
Cross-Domain Structure Preserving Projection for Heterogeneous Domain Adaptation
(2021)
Journal Article
Downloadable Citations
About Durham Research Online (DRO)
Administrator e-mail: dro.admin@durham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search