Return to search

Variable selection in discrete survival models

MSc (Statistics) / Department of Statistics / Selection of variables is vital in high dimensional statistical modelling as it aims to identify the right subset model. However, variable selection for discrete survival analysis poses many challenges due to a complicated data structure. Survival data might have unobserved heterogeneity leading to biased estimates when not taken into account. Conventional variable selection methods have stability problems. A simulation approach was used to assess and compare the performance of Least Absolute Shrinkage and Selection Operator (Lasso) and gradient boosting on discrete survival data. Parameter related mean squared errors (MSEs) and false positive rates suggest Lasso performs better than gradient boosting. Frailty models outperform discrete survival models that do not account for unobserved heterogeneity. The two methods were also applied on Zimbabwe Demographic Health Survey (ZDHS) 2016 data on age at first marriage and did not select exactly the same variables. Gradient boosting retained more variables into the model. Place of residence, highest educational level attained and age cohort are the major influential factors of age at first marriage in Zimbabwe based on Lasso. / NRF

Identiferoai:union.ndltd.org:netd.ac.za/oai:union.ndltd.org:univen/oai:univendspace.univen.ac.za:11602/1552
Date27 February 2020
CreatorsMabvuu, Coster
ContributorsBere, A., Sigauke, C.
Source SetsSouth African National ETD Portal
LanguageEnglish
Detected LanguageEnglish
TypeDissertation
Format1 online resource (xviii, 83 leaves), application/pdf
RightsUniversity of Venda

Page generated in 0.0257 seconds