Tanin Sirimongkolkasem
On regularisation methods for analysis of high dimensional data
Sirimongkolkasem, Tanin; Drikvandi, Reza
Abstract
High dimensional data are rapidly growing in many domains due to the development of technological advances which helps collect data with a large number of variables to better understand a given phenomenon of interest. Particular examples appear in genomics, fMRI data analysis, large-scale healthcare analytics, text/image analysis and astronomy. In the last two decades regularisation approaches have become the methods of choice for analysing such high dimensional data. This paper aims to study the performance of regularisation methods, including the recently proposed method called de-biased lasso, for the analysis of high dimensional data under different sparse and non-sparse situations. Our investigation concerns prediction, parameter estimation and variable selection. We particularly study the effects of correlated variables, covariate location and effect size which have not been well investigated. We find that correlated data when associated with important variables improve those common regularisation methods in all aspects, and that the level of sparsity can be reflected not only from the number of important variables but also from their overall effect size and locations. The latter may be seen under a non-sparse data structure. We demonstrate that the de-biased lasso performs well especially in low dimensional data, however it still suffers from issues, such as multicollinearity and multiple hypothesis testing, similar to the classical regression methods.
Citation
Sirimongkolkasem, T., & Drikvandi, R. (2019). On regularisation methods for analysis of high dimensional data. Annals of Data Science, 6(4), 737-763. https://doi.org/10.1007/s40745-019-00209-4
Journal Article Type | Article |
---|---|
Acceptance Date | Apr 7, 2019 |
Online Publication Date | Apr 13, 2019 |
Publication Date | 2019-12 |
Deposit Date | Oct 6, 2020 |
Publicly Available Date | Nov 2, 2020 |
Journal | Annals of Data Science |
Print ISSN | 2198-5804 |
Electronic ISSN | 2198-5812 |
Publisher | Springer |
Peer Reviewed | Peer Reviewed |
Volume | 6 |
Issue | 4 |
Pages | 737-763 |
DOI | https://doi.org/10.1007/s40745-019-00209-4 |
Public URL | https://durham-repository.worktribe.com/output/1254915 |
Files
Published Journal Article
(1.6 Mb)
PDF
Publisher Licence URL
http://creativecommons.org/licenses/by/4.0/
Copyright Statement
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
You might also like
MEGH: A parametric class of general hazard models for clustered survival data
(2022)
Journal Article
A goodness-of-fit test for the random-effects distribution in mixed models
(2017)
Journal Article
Testing variance components in balanced linear growth curve models
(2011)
Journal Article
Sparse principal component analysis for natural language processing
(2020)
Journal Article
Downloadable Citations
About Durham Research Online (DRO)
Administrator e-mail: dro.admin@durham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search