Analysis of logistic regression with sparse data and highly correlated covariates when data are sparse models / 高相關性變項在發生事件數小時,邏輯斯模式係數估計及變項選擇方法比較

碩士 / 國立陽明大學 / 公共衛生研究所 / 104 / Background:In neuroimage studies such as those focused on the association between brain Magnetic Resonance Imaging (MRI) and dementia, data often contain many clustered predictors. When the numbers of dementia cases are small or overall sample size is small, validity and reliability of parameter estimation and variables selections of the data analysis are challenging. These conditions are increasingly observed in gerontological studies. Few studies evaluated the performance of the widely available techniques in medical research when both the small events and large amount of clustered predictors exists. In the present study, we evaluated different estimation approaches and variable selection strategies at the contexture of logistic regression models with many clustered predictors given small sample sizes.
Methods: We simulated brain MRI data with small events to evaluate the performances of estimators such as maximum likelihood estimator and Firth penalized likelihood in varying variable selection methods including stepwise and LASSO. For parameter estimation, 95% coverage rates, convergence rates, percentage bias, mean square error (MSE), and ratio of variances are compared. For variable selections, we considered frequencies in correctly selected models with scenarios such as different combinations of candidate variables: variables included in another 5, 15, 25, 95, 195 null coefficient candidate variables, with varying numbers of clusters: 4, 6, 12, 24 clusters, with and without variable selection algorithms. The events /non events (E/NE) ratios considered were 90/10, 80/20, 70/30, 60/40, 50/50, 40/60, 30/70, 20/80, 10/90, and two non-null coefficient variables with highly correlation are considered.
Results and Conclusions: When the E/NE are sparse (ex:90/10, 10/90) and the main predictor is continuous, the performance of logistic regression with Firth penalized likelihood with likelihood based confidence intervals is better than those estimated by the commonly used maximum likelihood estimator with Wald methods. When null coefficient candidate variables were increased, the percentage bias increased and 95% coverage rate are worse. While no perfect variable selection methods for every scenario, the stepwise method is better than LASSO whenthe overall sample sizes is small (n=100). Additionally,the stepwise method is good in choosing correct clusters by selecting more clusters than others, but LASSO often has better performance in choosing the right clusters with higher proportion among all chosen clusters in bigger sample sizes (n=500).

Identiferoai:union.ndltd.org:TW/104YM005058030
Date January 2016
CreatorsWan-Ni Chen, 陳宛妮
ContributorsI-Feng Lin, 林逸芬
Source SetsNational Digital Library of Theses and Dissertations in Taiwan
Languagezh-TW
Detected LanguageEnglish
Type學位論文 ; thesis
Format130

Page generated in 0.0097 seconds