Return to search

Implementation, evaluation and application of multiple imputation for missing data in longitudinal electronic health record research

Longitudinal electronic health records are a valuable resource for research because they contain information on many patients over long follow-up periods. Missing data commonly occur in these data because it was collected for clinical and not research purposes. Analysing data with missing values can potentially bias estimates and standard errors resulting in invalid inferences. Multiple imputation, commonly used in research to impute missing values, is increasingly regarded as the standard method for handling missing data in medical research because of its practicality and flexibility under the assumption the data is missing at random (MAR). Until now, few imputation approaches are sufficiently flexible to account for the longitudinal and dynamic structure of electronic health records. However, the two-fold fully conditional specification (FCS) algorithm was proposed to impute missing values in longitudinal data, but this methods was not currently validated in the complex setting of longitudinal electronic health records. I propose to adapt, evaluate and implement the two-fold FCS algorithm to impute missing data from large primary care database. To achieve this, first I investigate the extent and patterns of missing data in a longitudinal clinical database for health indicators associated with cardiovascular disease risk to determine if the MAR assumption is plausible. Additionally, I develop methods to identify and remove outliers, which can potentially bias imputations, from data with repeated measurements before imputation. Next, I adapt and develop the two-fold FCS multiple imputation algorithm to impute missing values in longitudinal clinical data for health indicators associated with cardiovascular disease risk and I validate the two-fold FCS algorithm to assess bias and precision through challenging simulation studies. I develop a new software programme which implements this adapted version of the two-fold FCS algorithm to impute missing values in longitudinal data. Finally, I apply the two-fold FCS algorithm in THIN to (i) model cardiovascular disease risk and (ii) understand factors associated with greater total cholesterol reduction in patients with type II diabetes.
Date January 2015
CreatorsWelch, C. A.
PublisherUniversity College London (University of London)
Source SetsEthos UK
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation

Page generated in 0.0023 seconds