Dr Konstantinos Perrakis konstantinos.perrakis@durham.ac.uk
Assistant Professor
Scalable Bayesian regression in high dimensions with multiple data sources
Perrakis, Konstantinos; Mukherjee, Sach; Initiative, The Alzheimer’s Disease Neuroimaging
Authors
Sach Mukherjee
The Alzheimer’s Disease Neuroimaging Initiative
Abstract
Applications of high-dimensional regression often involve multiple sources or types of covariates. We propose methodology for this setting, emphasizing the “wide data” regime with large total dimensionality p and sample size n≪p. We focus on a flexible ridge-type prior with shrinkage levels that are specific to each data type or source and that are set automatically by empirical Bayes. All estimation, including setting of shrinkage levels, is formulated mainly in terms of inner product matrices of size n×n. This renders computation efficient in the wide data regime and allows scaling to problems with millions of features. Furthermore, the proposed procedures are free of user-set tuning parameters. We show how sparsity can be achieved by post-processing of the Bayesian output via constrained minimization of a certain Kullback–Leibler divergence. This yields sparse solutions with adaptive, source-specific shrinkage, including a closed-form variant that scales to very large p. We present empirical results from a simulation study based on real data and a case study in Alzheimer’s disease involving millions of features and multiple data sources.
Citation
Perrakis, K., Mukherjee, S., & Initiative, T. A. D. N. (2020). Scalable Bayesian regression in high dimensions with multiple data sources. Journal of Computational and Graphical Statistics, 29(1), 28-39. https://doi.org/10.1080/10618600.2019.1624294
Journal Article Type | Article |
---|---|
Acceptance Date | May 14, 2019 |
Online Publication Date | Jul 15, 2019 |
Publication Date | 2020 |
Deposit Date | Sep 26, 2019 |
Publicly Available Date | Jul 15, 2020 |
Journal | Journal of Computational and Graphical Statistics |
Print ISSN | 1061-8600 |
Electronic ISSN | 1537-2715 |
Publisher | American Statistical Association |
Peer Reviewed | Peer Reviewed |
Volume | 29 |
Issue | 1 |
Pages | 28-39 |
DOI | https://doi.org/10.1080/10618600.2019.1624294 |
Public URL | https://durham-repository.worktribe.com/output/1290450 |
Related Public URLs | https://arxiv.org/pdf/1710.00596.pdf |
Files
Accepted Journal Article
(592 Kb)
PDF
Copyright Statement
This is an Accepted Manuscript of an article published by Taylor & Francis in Journal of computational and graphical statistics on 15 July 2019 available online: http://www.tandfonline.com/10.1080/10618600.2019.1624294
You might also like
A Bayesian approach for modeling origin-destination matrices
(2011)
Presentation / Conference Contribution
Quantifying input-uncertainty in traffic assignment models
(2012)
Presentation / Conference Contribution
Poisson mixture regression for Bayesian inference on large over-dispersed transportation origin-destination matrices
(2012)
Presentation / Conference Contribution
Regularized joint mixture models
(2023)
Journal Article
Downloadable Citations
About Durham Research Online (DRO)
Administrator e-mail: dro.admin@durham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search